Overview

Dataset statistics

Number of variables27
Number of observations4,803
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.0 MiB
Average record size in memory2.1 KiB

Variable types

CAT19
NUM6
UNSUPPORTED2

Warnings

budget has a high cardinality: 437 distinct values High cardinality
genres has a high cardinality: 1175 distinct values High cardinality
homepage has a high cardinality: 1692 distinct values High cardinality
plot_keywords has a high cardinality: 4222 distinct values High cardinality
original_title has a high cardinality: 4801 distinct values High cardinality
overview has a high cardinality: 4801 distinct values High cardinality
production_companies has a high cardinality: 3697 distinct values High cardinality
production_countries has a high cardinality: 469 distinct values High cardinality
release_date has a high cardinality: 3281 distinct values High cardinality
spoken_languages has a high cardinality: 544 distinct values High cardinality
tagline has a high cardinality: 3945 distinct values High cardinality
movie_title has a high cardinality: 4800 distinct values High cardinality
country has a high cardinality: 71 distinct values High cardinality
director_name has a high cardinality: 2350 distinct values High cardinality
actor_1_name has a high cardinality: 2721 distinct values High cardinality
actor_2_name has a high cardinality: 3096 distinct values High cardinality
actor_3_name has a high cardinality: 3373 distinct values High cardinality
original_title is uniformly distributed Uniform
overview is uniformly distributed Uniform
release_date is uniformly distributed Uniform
movie_title is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
id has unique values Unique
duration is an unsupported type, check if it needs cleaning or further analysis Unsupported
title_year is an unsupported type, check if it needs cleaning or further analysis Unsupported
gross has 1427 (29.7%) zeros Zeros
vote_average has 63 (1.3%) zeros Zeros
num_voted_users has 62 (1.3%) zeros Zeros

Reproduction

Analysis started2020-12-16 15:46:00.473061
Analysis finished2020-12-16 15:46:19.333918
Duration18.86 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct4803
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2401
Minimum0
Maximum4802
Zeros1
Zeros (%)< 0.1%
Memory size37.6 KiB
2020-12-16T17:46:19.490105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile240.1
Q11200.5
median2401
Q33601.5
95-th percentile4561.9
Maximum4802
Range4802
Interquartile range (IQR)2401

Descriptive statistics

Standard deviation1386.651002
Coefficient of variation (CV)0.5775306129
Kurtosis-1.2
Mean2401
Median Absolute Deviation (MAD)1201
Skewness0
Sum11532003
Variance1922801
MonotocityStrictly increasing
2020-12-16T17:46:19.720978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
25721< 0.1%
 
46231< 0.1%
 
25761< 0.1%
 
5291< 0.1%
 
46271< 0.1%
 
25801< 0.1%
 
5331< 0.1%
 
46311< 0.1%
 
25841< 0.1%
 
5371< 0.1%
 
46351< 0.1%
 
25881< 0.1%
 
5411< 0.1%
 
46391< 0.1%
 
25921< 0.1%
 
5451< 0.1%
 
5251< 0.1%
 
46191< 0.1%
 
25961< 0.1%
 
5211< 0.1%
 
5011< 0.1%
 
45991< 0.1%
 
25521< 0.1%
 
5051< 0.1%
 
Other values (4778)477899.5%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
48021< 0.1%
 
48011< 0.1%
 
48001< 0.1%
 
47991< 0.1%
 
47981< 0.1%
 
47971< 0.1%
 
47961< 0.1%
 
47951< 0.1%
 
47941< 0.1%
 
47931< 0.1%
 

budget
Categorical

HIGH CARDINALITY

Distinct437
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
0
1037 
20000000
 
144
30000000
 
128
25000000
 
126
40000000
 
123
Other values (432)
3245 
ValueCountFrequency (%) 
0103721.6%
 
200000001443.0%
 
300000001282.7%
 
250000001262.6%
 
400000001232.6%
 
150000001202.5%
 
350000001022.1%
 
500000001012.1%
 
100000001012.1%
 
60000000861.8%
 
5000000841.7%
 
12000000791.6%
 
8000000621.3%
 
70000000601.2%
 
80000000591.2%
 
18000000591.2%
 
7000000551.1%
 
6000000551.1%
 
2000000541.1%
 
45000000521.1%
 
3000000511.1%
 
4000000491.0%
 
1000000481.0%
 
75000000471.0%
 
55000000450.9%
 
Other values (412)187639.1%
 
2020-12-16T17:46:19.969503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique232 ?
Unique (%)4.8%
2020-12-16T17:46:20.332421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length8
Mean length6.271913387
Min length1

Overview of Unicode Properties

Unique unicode characters11
Unique unicode categories2 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
02365078.5%
 
513434.5%
 
112664.2%
 
210373.4%
 
37202.4%
 
45201.7%
 
64691.6%
 
84671.6%
 
74261.4%
 
92250.7%
 
t1< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number30123> 99.9%
 
Lowercase Letter1< 0.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
02365078.5%
 
513434.5%
 
112664.2%
 
210373.4%
 
37202.4%
 
45201.7%
 
64691.6%
 
84671.6%
 
74261.4%
 
92250.7%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
t1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common30123> 99.9%
 
Latin1< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
02365078.5%
 
513434.5%
 
112664.2%
 
210373.4%
 
37202.4%
 
45201.7%
 
64691.6%
 
84671.6%
 
74261.4%
 
92250.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
t1100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII30124100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
02365078.5%
 
513434.5%
 
112664.2%
 
210373.4%
 
37202.4%
 
45201.7%
 
64691.6%
 
84671.6%
 
74261.4%
 
92250.7%
 
t1< 0.1%
 

genres
Categorical

HIGH CARDINALITY

Distinct1175
Distinct (%)24.5%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
Drama
 
370
Comedy
 
282
Drama|Romance
 
164
Comedy|Romance
 
144
Comedy|Drama
 
142
Other values (1170)
3701 
ValueCountFrequency (%) 
Drama3707.7%
 
Comedy2825.9%
 
Drama|Romance1643.4%
 
Comedy|Romance1443.0%
 
Comedy|Drama1423.0%
 
Comedy|Drama|Romance1092.3%
 
Horror|Thriller881.8%
 
Documentary681.4%
 
Horror641.3%
 
Drama|Thriller621.3%
 
Drama|Comedy461.0%
 
Crime|Drama|Thriller430.9%
 
Action|Thriller400.8%
 
Drama|History370.8%
 
Action|Comedy360.7%
 
Comedy|Family360.7%
 
Drama|Comedy|Romance350.7%
 
Crime|Drama330.7%
 
Comedy|Crime300.6%
 
Action|Crime|Thriller300.6%
 
UNK280.6%
 
Drama|Crime260.5%
 
Animation|Family250.5%
 
Action|Crime|Drama|Thriller250.5%
 
Adventure|Action|Thriller240.5%
 
Other values (1150)281658.6%
 
2020-12-16T17:46:20.555114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique739 ?
Unique (%)15.4%
2020-12-16T17:46:20.801718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length64
Median length18
Mean length18.69643973
Min length3

Overview of Unicode Properties

Unique unicode characters33
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
r88039.8%
 
e79008.8%
 
|73858.2%
 
a73378.2%
 
m64667.2%
 
i61346.8%
 
o59266.6%
 
n50265.6%
 
c39484.4%
 
t38744.3%
 
y36624.1%
 
l30613.4%
 
d25122.8%
 
C24182.7%
 
D24072.7%
 
A21782.4%
 
F15061.7%
 
T12821.4%
 
h12741.4%
 
s12361.4%
 
u10851.2%
 
R8941.0%
 
v7980.9%
 
H7160.8%
 
5430.6%
 
Other values (8)14281.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter6907676.9%
 
Uppercase Letter1279514.2%
 
Math Symbol73858.2%
 
Space Separator5430.6%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
C241818.9%
 
D240718.8%
 
A217817.0%
 
F150611.8%
 
T128210.0%
 
R8947.0%
 
H7165.6%
 
M5414.2%
 
S5354.2%
 
W2261.8%
 
U280.2%
 
N280.2%
 
K280.2%
 
V80.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
r880312.7%
 
e790011.4%
 
a733710.6%
 
m64669.4%
 
i61348.9%
 
o59268.6%
 
n50267.3%
 
c39485.7%
 
t38745.6%
 
y36625.3%
 
l30614.4%
 
d25123.6%
 
h12741.8%
 
s12361.8%
 
u10851.6%
 
v7981.2%
 
g34< 0.1%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
|7385100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
543100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin8187191.2%
 
Common79288.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
r880310.8%
 
e79009.6%
 
a73379.0%
 
m64667.9%
 
i61347.5%
 
o59267.2%
 
n50266.1%
 
c39484.8%
 
t38744.7%
 
y36624.5%
 
l30613.7%
 
d25123.1%
 
C24183.0%
 
D24072.9%
 
A21782.7%
 
F15061.8%
 
T12821.6%
 
h12741.6%
 
s12361.5%
 
u10851.3%
 
R8941.1%
 
v7981.0%
 
H7160.9%
 
M5410.7%
 
S5350.7%
 
Other values (6)3520.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
|738593.2%
 
5436.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII89799100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
r88039.8%
 
e79008.8%
 
|73858.2%
 
a73378.2%
 
m64667.2%
 
i61346.8%
 
o59266.6%
 
n50265.6%
 
c39484.4%
 
t38744.3%
 
y36624.1%
 
l30613.4%
 
d25122.8%
 
C24182.7%
 
D24072.7%
 
A21782.4%
 
F15061.7%
 
T12821.4%
 
h12741.4%
 
s12361.4%
 
u10851.2%
 
R8941.0%
 
v7980.9%
 
H7160.8%
 
5430.6%
 
Other values (8)14281.6%
 

homepage
Categorical

HIGH CARDINALITY

Distinct1692
Distinct (%)35.2%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
3091 
http://www.missionimpossible.com/
 
4
http://www.thehungergames.movie/
 
4
http://www.thehobbit.com/
 
3
http://www.transformersmovie.com/
 
3
Other values (1687)
1698 
ValueCountFrequency (%) 
UNK309164.4%
 
http://www.missionimpossible.com/40.1%
 
http://www.thehungergames.movie/40.1%
 
http://www.thehobbit.com/30.1%
 
http://www.transformersmovie.com/30.1%
 
http://www.kungfupanda.com/30.1%
 
http://www.ironmanmovie.com/2< 0.1%
 
http://www.howtotrainyourdragon.com/2< 0.1%
 
http://www.lordoftherings.net/2< 0.1%
 
http://www.indianajones.com2< 0.1%
 
http://www.workandtheglory.com/2< 0.1%
 
http://www.munkyourself.com/2< 0.1%
 
http://disney.go.com/disneypictures/pirates/2< 0.1%
 
http://www.riomovies.com/2< 0.1%
 
http://www.theamazingspiderman.com2< 0.1%
 
http://www.kickstarter.com/projects/1094772583/the-canyons1< 0.1%
 
http://www.mgm.com/view/movie/232/Die-Another-Day/1< 0.1%
 
http://robzombie.com/movies/the-lords-of-salem/1< 0.1%
 
http://paulblartmallcop.com/1< 0.1%
 
http://www.beastlythemovie.com/1< 0.1%
 
http://shanghaicalling.com/1< 0.1%
 
http://invictusmovie.warnerbros.com1< 0.1%
 
http://www.findnumberfour.com/1< 0.1%
 
http://www.starwars.com/films/star-wars-episode-ii-attack-of-the-clones1< 0.1%
 
http://www.gonegirlmovie.com/1< 0.1%
 
Other values (1667)166734.7%
 
2020-12-16T17:46:21.070539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1677 ?
Unique (%)34.9%
2020-12-16T17:46:21.329197image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length138
Median length3
Mean length14.91234645
Min length3

Overview of Unicode Properties

Unique unicode characters78
Unique unicode categories10 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
t57078.0%
 
/57018.0%
 
e44676.2%
 
o44386.2%
 
w44186.2%
 
m34924.9%
 
.33934.7%
 
h31104.3%
 
N31004.3%
 
U30944.3%
 
K30934.3%
 
i30174.2%
 
c25433.6%
 
p23473.3%
 
s22803.2%
 
r22093.1%
 
a21963.1%
 
n20512.9%
 
:17132.4%
 
l14142.0%
 
v11621.6%
 
d10811.5%
 
u7931.1%
 
g6851.0%
 
f6841.0%
 
Other values (53)34364.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter4998669.8%
 
Other Punctuation1085115.1%
 
Uppercase Letter951813.3%
 
Dash Punctuation6130.9%
 
Decimal Number5570.8%
 
Connector Punctuation680.1%
 
Math Symbol22< 0.1%
 
Open Punctuation4< 0.1%
 
Close Punctuation4< 0.1%
 
Space Separator1< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
t570711.4%
 
e44678.9%
 
o44388.9%
 
w44188.8%
 
m34927.0%
 
h31106.2%
 
i30176.0%
 
c25435.1%
 
p23474.7%
 
s22804.6%
 
r22094.4%
 
a21964.4%
 
n20514.1%
 
l14142.8%
 
v11622.3%
 
d10812.2%
 
u7931.6%
 
g6851.4%
 
f6841.4%
 
y5851.2%
 
b5631.1%
 
k3570.7%
 
x1910.4%
 
j1230.2%
 
z610.1%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/570152.5%
 
.339331.3%
 
:171315.8%
 
#180.2%
 
?180.2%
 
%3< 0.1%
 
&3< 0.1%
 
!1< 0.1%
 
,1< 0.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-613100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
210418.7%
 
18715.6%
 
08415.1%
 
37112.7%
 
9447.9%
 
5356.3%
 
4346.1%
 
7335.9%
 
6335.9%
 
8325.7%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_68100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
N310032.6%
 
U309432.5%
 
K309332.5%
 
D220.2%
 
A200.2%
 
T200.2%
 
M190.2%
 
S170.2%
 
E140.1%
 
L130.1%
 
G120.1%
 
W100.1%
 
H100.1%
 
B100.1%
 
F100.1%
 
C90.1%
 
I90.1%
 
R80.1%
 
O80.1%
 
P60.1%
 
V50.1%
 
Y50.1%
 
J2< 0.1%
 
Q1< 0.1%
 
Z1< 0.1%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
=22100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(375.0%
 
{125.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)375.0%
 
}125.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin5950483.1%
 
Common1212016.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
t57079.6%
 
e44677.5%
 
o44387.5%
 
w44187.4%
 
m34925.9%
 
h31105.2%
 
N31005.2%
 
U30945.2%
 
K30935.2%
 
i30175.1%
 
c25434.3%
 
p23473.9%
 
s22803.8%
 
r22093.7%
 
a21963.7%
 
n20513.4%
 
l14142.4%
 
v11622.0%
 
d10811.8%
 
u7931.3%
 
g6851.2%
 
f6841.1%
 
y5851.0%
 
b5630.9%
 
k3570.6%
 
Other values (26)6181.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
/570147.0%
 
.339328.0%
 
:171314.1%
 
-6135.1%
 
21040.9%
 
1870.7%
 
0840.7%
 
3710.6%
 
_680.6%
 
9440.4%
 
5350.3%
 
4340.3%
 
7330.3%
 
6330.3%
 
8320.3%
 
=220.2%
 
#180.1%
 
?180.1%
 
(3< 0.1%
 
)3< 0.1%
 
%3< 0.1%
 
&3< 0.1%
 
!1< 0.1%
 
1< 0.1%
 
,1< 0.1%
 
Other values (2)2< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII71624100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
t57078.0%
 
/57018.0%
 
e44676.2%
 
o44386.2%
 
w44186.2%
 
m34924.9%
 
.33934.7%
 
h31104.3%
 
N31004.3%
 
U30944.3%
 
K30934.3%
 
i30174.2%
 
c25433.6%
 
p23473.3%
 
s22803.2%
 
r22093.1%
 
a21963.1%
 
n20512.9%
 
:17132.4%
 
l14142.0%
 
v11621.6%
 
d10811.5%
 
u7931.1%
 
g6851.0%
 
f6841.0%
 
Other values (53)34364.8%
 

id
Real number (ℝ≥0)

UNIQUE

Distinct4803
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57165.48428
Minimum5
Maximum459488
Zeros0
Zeros (%)0.0%
Memory size37.6 KiB
2020-12-16T17:46:21.553279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile578.1
Q19014.5
median14629
Q358610.5
95-th percentile285779
Maximum459488
Range459483
Interquartile range (IQR)49596

Descriptive statistics

Standard deviation88694.61403
Coefficient of variation (CV)1.551541374
Kurtosis3.346747662
Mean57165.48428
Median Absolute Deviation (MAD)12920
Skewness2.072080474
Sum274565821
Variance7866734559
MonotocityNot monotonic
2020-12-16T17:46:21.779579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
450541< 0.1%
 
190841< 0.1%
 
411441< 0.1%
 
131871< 0.1%
 
88491< 0.1%
 
293391< 0.1%
 
2869391< 0.1%
 
6731< 0.1%
 
103681< 0.1%
 
144381< 0.1%
 
682021< 0.1%
 
88691< 0.1%
 
723581< 0.1%
 
2996871< 0.1%
 
6811< 0.1%
 
2607781< 0.1%
 
45181< 0.1%
 
1094171< 0.1%
 
509421< 0.1%
 
1788621< 0.1%
 
88411< 0.1%
 
13811< 0.1%
 
2181< 0.1%
 
395381< 0.1%
 
47231< 0.1%
 
Other values (4778)477899.5%
 
ValueCountFrequency (%) 
51< 0.1%
 
111< 0.1%
 
121< 0.1%
 
131< 0.1%
 
141< 0.1%
 
161< 0.1%
 
181< 0.1%
 
191< 0.1%
 
201< 0.1%
 
221< 0.1%
 
ValueCountFrequency (%) 
4594881< 0.1%
 
4470271< 0.1%
 
4337151< 0.1%
 
4264691< 0.1%
 
4260671< 0.1%
 
4178591< 0.1%
 
4084291< 0.1%
 
4078871< 0.1%
 
4025151< 0.1%
 
3961521< 0.1%
 

plot_keywords
Categorical

HIGH CARDINALITY

Distinct4222
Distinct (%)87.9%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
 
412
independent film
 
55
woman director
 
42
duringcreditsstinger
 
15
sport
 
13
Other values (4217)
4266 
ValueCountFrequency (%) 
UNK4128.6%
 
independent film551.1%
 
woman director420.9%
 
duringcreditsstinger150.3%
 
sport130.3%
 
independent film|woman director100.2%
 
musical50.1%
 
biography50.1%
 
suspense50.1%
 
dystopia30.1%
 
holiday|christmas30.1%
 
christian30.1%
 
gay30.1%
 
superhero30.1%
 
mumblecore30.1%
 
aftercreditsstinger30.1%
 
tv movie2< 0.1%
 
bank robbery2< 0.1%
 
sport|independent film2< 0.1%
 
blaxploitation2< 0.1%
 
aftercreditsstinger|duringcreditsstinger2< 0.1%
 
mutant|marvel comic|superhero|based on comic book|superhuman2< 0.1%
 
soccer2< 0.1%
 
baseball|sport2< 0.1%
 
road movie2< 0.1%
 
Other values (4197)420287.5%
 
2020-12-16T17:46:22.035436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4192 ?
Unique (%)87.3%
2020-12-16T17:46:22.283216image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1254
Median length68
Mean length83.35956694
Min length2

Overview of Unicode Properties

Unique unicode characters81
Unique unicode categories10 ?
Unique unicode scripts4 ?
Unique unicode blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e378569.5%
 
|318037.9%
 
i300177.5%
 
a291797.3%
 
r282537.1%
 
n251306.3%
 
o246806.2%
 
t239916.0%
 
s224545.6%
 
183634.6%
 
l171084.3%
 
c145173.6%
 
d130813.3%
 
m109022.7%
 
u97852.4%
 
p97312.4%
 
g97132.4%
 
h96112.4%
 
f61631.5%
 
y59021.5%
 
b57491.4%
 
v41571.0%
 
w37790.9%
 
k30040.8%
 
x10930.3%
 
Other values (56)43551.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter34737186.8%
 
Math Symbol318047.9%
 
Space Separator183894.6%
 
Uppercase Letter12370.3%
 
Decimal Number6920.2%
 
Dash Punctuation4510.1%
 
Other Punctuation3710.1%
 
Open Punctuation23< 0.1%
 
Close Punctuation23< 0.1%
 
Other Letter15< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e3785610.9%
 
i300178.6%
 
a291798.4%
 
r282538.1%
 
n251307.2%
 
o246807.1%
 
t239916.9%
 
s224546.5%
 
l171084.9%
 
c145174.2%
 
d130813.8%
 
m109023.1%
 
u97852.8%
 
p97312.8%
 
g97132.8%
 
h96112.8%
 
f61631.8%
 
y59021.7%
 
b57491.7%
 
v41571.2%
 
w37791.1%
 
k30040.9%
 
x10930.3%
 
j7430.2%
 
z4620.1%
 
Other values (14)3110.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1836399.9%
 
 260.1%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
|31803> 99.9%
 
+1< 0.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
117625.4%
 
914521.0%
 
012017.3%
 
310615.3%
 
7426.1%
 
6324.6%
 
8233.3%
 
5223.2%
 
2182.6%
 
481.2%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.18249.1%
 
'17146.1%
 
,92.4%
 
"41.1%
 
/20.5%
 
&20.5%
 
*10.3%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-44598.7%
 
61.3%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(23100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)23100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
U41233.3%
 
N41233.3%
 
K41233.3%
 
Γ10.1%
 

Most frequent Other Letter characters

ValueCountFrequency (%) 
320.0%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin34860687.1%
 
Common5175312.9%
 
Han15< 0.1%
 
Greek2< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e3785610.9%
 
i300178.6%
 
a291798.4%
 
r282538.1%
 
n251307.2%
 
o246807.1%
 
t239916.9%
 
s224546.4%
 
l171084.9%
 
c145174.2%
 
d130813.8%
 
m109023.1%
 
u97852.8%
 
p97312.8%
 
g97132.8%
 
h96112.8%
 
f61631.8%
 
y59021.7%
 
b57491.6%
 
v41571.2%
 
w37791.1%
 
k30040.9%
 
x10930.3%
 
j7430.2%
 
z4620.1%
 
Other values (16)15460.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
|3180361.5%
 
1836335.5%
 
-4450.9%
 
.1820.4%
 
11760.3%
 
'1710.3%
 
91450.3%
 
01200.2%
 
31060.2%
 
7420.1%
 
6320.1%
 
 260.1%
 
(23< 0.1%
 
)23< 0.1%
 
823< 0.1%
 
522< 0.1%
 
218< 0.1%
 
,9< 0.1%
 
48< 0.1%
 
6< 0.1%
 
"4< 0.1%
 
/2< 0.1%
 
&2< 0.1%
 
*1< 0.1%
 
+1< 0.1%
 

Most frequent Greek characters

ValueCountFrequency (%) 
Γ150.0%
 
η150.0%
 

Most frequent Han characters

ValueCountFrequency (%) 
320.0%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII400289> 99.9%
 
None66< 0.1%
 
CJK15< 0.1%
 
Punctuation6< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e378569.5%
 
|318037.9%
 
i300177.5%
 
a291797.3%
 
r282537.1%
 
n251306.3%
 
o246806.2%
 
t239916.0%
 
s224545.6%
 
183634.6%
 
l171084.3%
 
c145173.6%
 
d130813.3%
 
m109022.7%
 
u97852.4%
 
p97312.4%
 
g97132.4%
 
h96112.4%
 
f61631.5%
 
y59021.5%
 
b57491.4%
 
v41571.0%
 
w37790.9%
 
k30040.8%
 
x10930.3%
 
Other values (27)42681.1%
 

Most frequent None characters

ValueCountFrequency (%) 
 2639.4%
 
é2334.8%
 
ö34.5%
 
ä23.0%
 
ß23.0%
 
á11.5%
 
í11.5%
 
ç11.5%
 
ó11.5%
 
Γ11.5%
 
η11.5%
 
ú11.5%
 
ü11.5%
 
ű11.5%
 
ô11.5%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
6100.0%
 

Most frequent CJK characters

ValueCountFrequency (%) 
320.0%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 
16.7%
 

language
Categorical

Distinct47
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
English
4102 
Français
 
108
UNK
 
98
Español
 
84
Deutsch
 
61
Other values (42)
 
350
ValueCountFrequency (%) 
English410285.4%
 
Français1082.2%
 
UNK982.0%
 
Español841.7%
 
Deutsch611.3%
 
العربية330.7%
 
普通话320.7%
 
Italiano320.7%
 
Pусский310.6%
 
Český300.6%
 
广州话 / 廣州話280.6%
 
日本語230.5%
 
हिन्दी220.5%
 
Português170.4%
 
Dansk120.2%
 
Latin80.2%
 
한국어/조선말80.2%
 
Nederlands60.1%
 
עִבְרִית60.1%
 
Afrikaans50.1%
 
svenska50.1%
 
ελληνικά50.1%
 
Norsk40.1%
 
Magyar40.1%
 
Română40.1%
 
Other values (22)350.7%
 
2020-12-16T17:46:22.521075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique14 ?
Unique (%)0.3%
2020-12-16T17:46:22.718299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length16
Median length7
Mean length6.904851135
Min length3

Overview of Unicode Properties

Unique unicode characters133
Unique unicode categories7 ?
Unique unicode scripts12 ?
Unique unicode blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
s445613.4%
 
n429112.9%
 
i427912.9%
 
l423312.8%
 
E418812.6%
 
h416812.6%
 
g413112.5%
 
a4301.3%
 
o1500.5%
 
r1470.4%
 
t1230.4%
 
e1170.4%
 
N1100.3%
 
F1080.3%
 
ç1080.3%
 
K1010.3%
 
U980.3%
 
u970.3%
 
p870.3%
 
ñ840.3%
 
D730.2%
 
с650.2%
 
k630.2%
 
620.2%
 
c610.2%
 
Other values (108)13344.0%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter2737982.6%
 
Uppercase Letter482714.6%
 
Other Letter7632.3%
 
Space Separator620.2%
 
Spacing Mark490.1%
 
Other Punctuation420.1%
 
Nonspacing Mark420.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
E418886.8%
 
N1102.3%
 
F1082.2%
 
K1012.1%
 
U982.0%
 
D731.5%
 
P511.1%
 
I320.7%
 
Č300.6%
 
L100.2%
 
A50.1%
 
R40.1%
 
M40.1%
 
У2< 0.1%
 
T2< 0.1%
 
V2< 0.1%
 
G2< 0.1%
 
B2< 0.1%
 
Í1< 0.1%
 
H1< 0.1%
 
S1< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
s445616.3%
 
n429115.7%
 
i427915.6%
 
l423315.5%
 
h416815.2%
 
g413115.1%
 
a4301.6%
 
o1500.5%
 
r1470.5%
 
t1230.4%
 
e1170.4%
 
ç1080.4%
 
u970.4%
 
p870.3%
 
ñ840.3%
 
с650.2%
 
k630.2%
 
c610.2%
 
к370.1%
 
и350.1%
 
й330.1%
 
у310.1%
 
ý300.1%
 
ê170.1%
 
d13< 0.1%
 
Other values (28)930.3%
 

Most frequent Other Letter characters

ValueCountFrequency (%) 
607.9%
 
567.3%
 
ا374.8%
 
ر374.8%
 
ل334.3%
 
ع334.3%
 
ب334.3%
 
ي334.3%
 
ة334.3%
 
324.2%
 
324.2%
 
广283.7%
 
283.7%
 
283.7%
 
233.0%
 
233.0%
 
233.0%
 
222.9%
 
222.9%
 
222.9%
 
81.0%
 
81.0%
 
81.0%
 
81.0%
 
81.0%
 
Other values (22)8511.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
62100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/3685.7%
 
?614.3%
 

Most frequent Nonspacing Mark characters

ValueCountFrequency (%) 
2252.4%
 
ִ1228.6%
 
ְ614.3%
 
24.8%
 

Most frequent Spacing Mark characters

ValueCountFrequency (%) 
ि2244.9%
 
2244.9%
 
ி24.1%
 
24.1%
 
12.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin3194596.3%
 
Han3331.0%
 
Arabic2500.8%
 
Cyrillic2210.7%
 
Devanagari1320.4%
 
Common1040.3%
 
Hebrew480.1%
 
Hangul480.1%
 
Greek400.1%
 
Thai280.1%
 
Tamil10< 0.1%
 
Bengali5< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
s445613.9%
 
n429113.4%
 
i427913.4%
 
l423313.3%
 
E418813.1%
 
h416813.0%
 
g413112.9%
 
a4301.3%
 
o1500.5%
 
r1470.5%
 
t1230.4%
 
e1170.4%
 
N1100.3%
 
F1080.3%
 
ç1080.3%
 
K1010.3%
 
U980.3%
 
u970.3%
 
p870.3%
 
ñ840.3%
 
D730.2%
 
k630.2%
 
c610.2%
 
P510.2%
 
I320.1%
 
Other values (25)1590.5%
 

Most frequent Han characters

ValueCountFrequency (%) 
6018.0%
 
5616.8%
 
329.6%
 
329.6%
 
广288.4%
 
288.4%
 
288.4%
 
236.9%
 
236.9%
 
236.9%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ا3714.8%
 
ر3714.8%
 
ل3313.2%
 
ع3313.2%
 
ب3313.2%
 
ي3313.2%
 
ة3313.2%
 
ف31.2%
 
س31.2%
 
ی31.2%
 
د10.4%
 
و10.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
6259.6%
 
/3634.6%
 
?65.8%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
с6529.4%
 
к3716.7%
 
и3515.8%
 
й3314.9%
 
у3114.0%
 
а31.4%
 
р31.4%
 
У20.9%
 
ї20.9%
 
н20.9%
 
ь20.9%
 
б10.5%
 
ъ10.5%
 
л10.5%
 
г10.5%
 
е10.5%
 
з10.5%
 

Most frequent Greek characters

ValueCountFrequency (%) 
λ1025.0%
 
ε512.5%
 
η512.5%
 
ν512.5%
 
ι512.5%
 
κ512.5%
 
ά512.5%
 

Most frequent Hebrew characters

ValueCountFrequency (%) 
ִ1225.0%
 
ע612.5%
 
ב612.5%
 
ְ612.5%
 
ר612.5%
 
י612.5%
 
ת612.5%
 

Most frequent Devanagari characters

ValueCountFrequency (%) 
2216.7%
 
ि2216.7%
 
2216.7%
 
2216.7%
 
2216.7%
 
2216.7%
 

Most frequent Hangul characters

ValueCountFrequency (%) 
816.7%
 
816.7%
 
816.7%
 
816.7%
 
816.7%
 
816.7%
 

Most frequent Thai characters

ValueCountFrequency (%) 
828.6%
 
414.3%
 
414.3%
 
414.3%
 
414.3%
 
414.3%
 

Most frequent Tamil characters

ValueCountFrequency (%) 
220.0%
 
220.0%
 
ி220.0%
 
220.0%
 
220.0%
 

Most frequent Bengali characters

ValueCountFrequency (%) 
240.0%
 
120.0%
 
120.0%
 
120.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3176795.8%
 
CJK3331.0%
 
None3181.0%
 
Arabic2500.8%
 
Cyrillic2210.7%
 
Devanagari1320.4%
 
Hebrew480.1%
 
Hangul480.1%
 
Thai280.1%
 
Tamil10< 0.1%
 
Bengali5< 0.1%
 
Latin Ext Additional4< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
s445614.0%
 
n429113.5%
 
i427913.5%
 
l423313.3%
 
E418813.2%
 
h416813.1%
 
g413113.0%
 
a4301.4%
 
o1500.5%
 
r1470.5%
 
t1230.4%
 
e1170.4%
 
N1100.3%
 
F1080.3%
 
K1010.3%
 
U980.3%
 
u970.3%
 
p870.3%
 
D730.2%
 
k630.2%
 
620.2%
 
c610.2%
 
P510.2%
 
/360.1%
 
I320.1%
 
Other values (18)750.2%
 

Most frequent None characters

ValueCountFrequency (%) 
ç10834.0%
 
ñ8426.4%
 
Č309.4%
 
ý309.4%
 
ê175.3%
 
λ103.1%
 
ε51.6%
 
η51.6%
 
ν51.6%
 
ι51.6%
 
κ51.6%
 
ά51.6%
 
â41.3%
 
ă41.3%
 
Í10.3%
 

Most frequent CJK characters

ValueCountFrequency (%) 
6018.0%
 
5616.8%
 
329.6%
 
329.6%
 
广288.4%
 
288.4%
 
288.4%
 
236.9%
 
236.9%
 
236.9%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ا3714.8%
 
ر3714.8%
 
ل3313.2%
 
ع3313.2%
 
ب3313.2%
 
ي3313.2%
 
ة3313.2%
 
ف31.2%
 
س31.2%
 
ی31.2%
 
د10.4%
 
و10.4%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
с6529.4%
 
к3716.7%
 
и3515.8%
 
й3314.9%
 
у3114.0%
 
а31.4%
 
р31.4%
 
У20.9%
 
ї20.9%
 
н20.9%
 
ь20.9%
 
б10.5%
 
ъ10.5%
 
л10.5%
 
г10.5%
 
е10.5%
 
з10.5%
 

Most frequent Hebrew characters

ValueCountFrequency (%) 
ִ1225.0%
 
ע612.5%
 
ב612.5%
 
ְ612.5%
 
ר612.5%
 
י612.5%
 
ת612.5%
 

Most frequent Devanagari characters

ValueCountFrequency (%) 
2216.7%
 
ि2216.7%
 
2216.7%
 
2216.7%
 
2216.7%
 
2216.7%
 

Most frequent Hangul characters

ValueCountFrequency (%) 
816.7%
 
816.7%
 
816.7%
 
816.7%
 
816.7%
 
816.7%
 

Most frequent Thai characters

ValueCountFrequency (%) 
828.6%
 
414.3%
 
414.3%
 
414.3%
 
414.3%
 
414.3%
 

Most frequent Latin Ext Additional characters

ValueCountFrequency (%) 
ế250.0%
 
250.0%
 

Most frequent Tamil characters

ValueCountFrequency (%) 
220.0%
 
220.0%
 
ி220.0%
 
220.0%
 
220.0%
 

Most frequent Bengali characters

ValueCountFrequency (%) 
240.0%
 
120.0%
 
120.0%
 
120.0%
 

original_title
Categorical

HIGH CARDINALITY
UNIFORM

Distinct4801
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
Out of the Blue
 
2
Batman
 
2
Slither
 
1
The Benchwarmers
 
1
Knocked Up
 
1
Other values (4796)
4796 
ValueCountFrequency (%) 
Out of the Blue2< 0.1%
 
Batman2< 0.1%
 
Slither1< 0.1%
 
The Benchwarmers1< 0.1%
 
Knocked Up1< 0.1%
 
Fever Pitch1< 0.1%
 
Miracle at St. Anna1< 0.1%
 
Bandidas1< 0.1%
 
R1001< 0.1%
 
Appaloosa1< 0.1%
 
キャプテンハーロック1< 0.1%
 
The Railway Man1< 0.1%
 
Boyhood1< 0.1%
 
Killer Elite1< 0.1%
 
Kansas City1< 0.1%
 
The Grudge1< 0.1%
 
The Yellow Handkerchief1< 0.1%
 
Truth or Dare1< 0.1%
 
Wimbledon1< 0.1%
 
Love & Basketball1< 0.1%
 
The Bounty Hunter1< 0.1%
 
Kung Pow: Enter the Fist1< 0.1%
 
Wonderland1< 0.1%
 
Bully1< 0.1%
 
Return to the Blue Lagoon1< 0.1%
 
Other values (4776)477699.4%
 
2020-12-16T17:46:22.945553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4799 ?
Unique (%)99.9%
2020-12-16T17:46:23.211547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length86
Median length14
Mean length15.22298563
Min length1

Overview of Unicode Properties

Unique unicode characters410
Unique unicode categories19 ?
Unique unicode scripts13 ?
Unique unicode blocks14 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
843611.5%
 
e741210.1%
 
a46256.3%
 
o43706.0%
 
r38925.3%
 
n38855.3%
 
i37235.1%
 
t36004.9%
 
s28573.9%
 
h27173.7%
 
l24263.3%
 
d17602.4%
 
T15622.1%
 
u15252.1%
 
c11791.6%
 
g11371.6%
 
y10741.5%
 
m10661.5%
 
S9721.3%
 
f8151.1%
 
M7771.1%
 
B7331.0%
 
p6951.0%
 
D6800.9%
 
C6380.9%
 
Other values (385)1056014.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter5140170.3%
 
Uppercase Letter1141315.6%
 
Space Separator843611.5%
 
Other Punctuation9061.2%
 
Decimal Number4940.7%
 
Other Letter3280.4%
 
Dash Punctuation840.1%
 
Spacing Mark16< 0.1%
 
Nonspacing Mark8< 0.1%
 
Open Punctuation6< 0.1%
 
Close Punctuation6< 0.1%
 
Currency Symbol5< 0.1%
 
Other Number4< 0.1%
 
Final Punctuation3< 0.1%
 
Math Symbol2< 0.1%
 
Modifier Letter1< 0.1%
 
Connector Punctuation1< 0.1%
 
Format1< 0.1%
 
Other Symbol1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
T156213.7%
 
S9728.5%
 
M7776.8%
 
B7336.4%
 
D6806.0%
 
C6385.6%
 
A6185.4%
 
L5655.0%
 
H5264.6%
 
P4704.1%
 
W4694.1%
 
R4544.0%
 
G4473.9%
 
I4433.9%
 
F4353.8%
 
E3022.6%
 
N2622.3%
 
O2121.9%
 
J1911.7%
 
K1761.5%
 
V1421.2%
 
Y1251.1%
 
U1111.0%
 
Z420.4%
 
Q250.2%
 
Other values (12)360.3%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e741214.4%
 
a46259.0%
 
o43708.5%
 
r38927.6%
 
n38857.6%
 
i37237.2%
 
t36007.0%
 
s28575.6%
 
h27175.3%
 
l24264.7%
 
d17603.4%
 
u15253.0%
 
c11792.3%
 
g11372.2%
 
y10742.1%
 
m10662.1%
 
f8151.6%
 
p6951.4%
 
k6351.2%
 
v5771.1%
 
w4720.9%
 
b4630.9%
 
x1410.3%
 
z940.2%
 
j540.1%
 
Other values (55)2070.4%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
8436100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
:34137.6%
 
'22224.5%
 
.14115.6%
 
,758.3%
 
&606.6%
 
!353.9%
 
?182.0%
 
/70.8%
 
#20.2%
 
*20.2%
 
·10.1%
 
10.1%
 
10.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-8297.6%
 
22.4%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
214429.1%
 
17815.8%
 
37415.0%
 
07415.0%
 
4357.1%
 
8214.3%
 
5214.3%
 
9173.4%
 
7153.0%
 
6153.0%
 

Most frequent Other Letter characters

ValueCountFrequency (%) 
72.1%
 
ا72.1%
 
ی61.8%
 
51.5%
 
41.2%
 
41.2%
 
41.2%
 
ن41.2%
 
30.9%
 
30.9%
 
30.9%
 
30.9%
 
30.9%
 
س30.9%
 
ر30.9%
 
د30.9%
 
ه30.9%
 
20.6%
 
20.6%
 
20.6%
 
20.6%
 
20.6%
 
20.6%
 
20.6%
 
20.6%
 
Other values (214)24474.4%
 

Most frequent Other Number characters

ValueCountFrequency (%) 
³125.0%
 
125.0%
 
½125.0%
 
²125.0%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$360.0%
 
¢240.0%
 

Most frequent Modifier Letter characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Spacing Mark characters

ValueCountFrequency (%) 
531.2%
 
318.8%
 
318.8%
 
212.5%
 
ि212.5%
 
16.2%
 

Most frequent Nonspacing Mark characters

ValueCountFrequency (%) 
225.0%
 
225.0%
 
112.5%
 
112.5%
 
112.5%
 
112.5%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+2100.0%
 

Most frequent Final Punctuation characters

ValueCountFrequency (%) 
3100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(466.7%
 
[233.3%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)466.7%
 
]233.3%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_1100.0%
 

Most frequent Format characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
°1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin6267785.7%
 
Common994913.6%
 
Han1600.2%
 
Cyrillic1270.2%
 
Hangul450.1%
 
Arabic390.1%
 
Devanagari35< 0.1%
 
Katakana26< 0.1%
 
Hiragana17< 0.1%
 
Thai17< 0.1%
 
Tamil13< 0.1%
 
Greek10< 0.1%
 
Inherited1< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e741211.8%
 
a46257.4%
 
o43707.0%
 
r38926.2%
 
n38856.2%
 
i37235.9%
 
t36005.7%
 
s28574.6%
 
h27174.3%
 
l24263.9%
 
d17602.8%
 
T15622.5%
 
u15252.4%
 
c11791.9%
 
g11371.8%
 
y10741.7%
 
m10661.7%
 
S9721.6%
 
f8151.3%
 
M7771.2%
 
B7331.2%
 
p6951.1%
 
D6801.1%
 
C6381.0%
 
k6351.0%
 
Other values (47)792212.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
843684.8%
 
:3413.4%
 
'2222.2%
 
21441.4%
 
.1411.4%
 
-820.8%
 
1780.8%
 
,750.8%
 
3740.7%
 
0740.7%
 
&600.6%
 
4350.4%
 
!350.4%
 
8210.2%
 
5210.2%
 
?180.2%
 
9170.2%
 
7150.2%
 
6150.2%
 
/70.1%
 
(4< 0.1%
 
)4< 0.1%
 
3< 0.1%
 
$3< 0.1%
 
¢2< 0.1%
 
Other values (16)220.2%
 

Most frequent Katakana characters

ValueCountFrequency (%) 
311.5%
 
27.7%
 
27.7%
 
27.7%
 
27.7%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 
13.8%
 

Most frequent Han characters

ValueCountFrequency (%) 
53.1%
 
42.5%
 
42.5%
 
31.9%
 
31.9%
 
31.9%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
Other values (98)10062.5%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
о1814.2%
 
а107.9%
 
е107.9%
 
р97.1%
 
н86.3%
 
л75.5%
 
и64.7%
 
в64.7%
 
С53.9%
 
д43.1%
 
к43.1%
 
б43.1%
 
г43.1%
 
с32.4%
 
з32.4%
 
я21.6%
 
т21.6%
 
ы21.6%
 
у21.6%
 
п21.6%
 
ц10.8%
 
Б10.8%
 
З10.8%
 
ё10.8%
 
М10.8%
 
Other values (11)118.7%
 

Most frequent Hiragana characters

ValueCountFrequency (%) 
741.2%
 
211.8%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 

Most frequent Hangul characters

ValueCountFrequency (%) 
36.7%
 
24.4%
 
24.4%
 
24.4%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
Other values (15)1533.3%
 

Most frequent Tamil characters

ValueCountFrequency (%) 
323.1%
 
215.4%
 
215.4%
 
17.7%
 
17.7%
 
17.7%
 
17.7%
 
17.7%
 
17.7%
 

Most frequent Devanagari characters

ValueCountFrequency (%) 
514.3%
 
411.4%
 
38.6%
 
25.7%
 
25.7%
 
ि25.7%
 
25.7%
 
25.7%
 
25.7%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 

Most frequent Thai characters

ValueCountFrequency (%) 
211.8%
 
211.8%
 
211.8%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ا717.9%
 
ی615.4%
 
ن410.3%
 
س37.7%
 
ر37.7%
 
د37.7%
 
ه37.7%
 
ب25.1%
 
م25.1%
 
ك12.6%
 
ت12.6%
 
ج12.6%
 
ز12.6%
 
چ12.6%
 
آ12.6%
 

Most frequent Greek characters

ValueCountFrequency (%) 
ν220.0%
 
Κ110.0%
 
υ110.0%
 
ό110.0%
 
δ110.0%
 
ο110.0%
 
τ110.0%
 
α110.0%
 
ς110.0%
 

Most frequent Inherited characters

ValueCountFrequency (%) 
1100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII7255499.2%
 
CJK1600.2%
 
Cyrillic1270.2%
 
None720.1%
 
Hangul450.1%
 
Arabic390.1%
 
Devanagari35< 0.1%
 
Katakana28< 0.1%
 
Hiragana17< 0.1%
 
Thai17< 0.1%
 
Tamil13< 0.1%
 
Punctuation7< 0.1%
 
Number Forms1< 0.1%
 
Latin Ext Additional1< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
843611.6%
 
e741210.2%
 
a46256.4%
 
o43706.0%
 
r38925.4%
 
n38855.4%
 
i37235.1%
 
t36005.0%
 
s28573.9%
 
h27173.7%
 
l24263.3%
 
d17602.4%
 
T15622.2%
 
u15252.1%
 
c11791.6%
 
g11371.6%
 
y10741.5%
 
m10661.5%
 
S9721.3%
 
f8151.1%
 
M7771.1%
 
B7331.0%
 
p6951.0%
 
D6800.9%
 
C6380.9%
 
Other values (56)999813.8%
 

Most frequent None characters

ValueCountFrequency (%) 
é1825.0%
 
à45.6%
 
è45.6%
 
ó45.6%
 
á34.2%
 
í34.2%
 
å34.2%
 
ü22.8%
 
¢22.8%
 
ñ22.8%
 
ă22.8%
 
ø22.8%
 
ν22.8%
 
·11.4%
 
É11.4%
 
³11.4%
 
Æ11.4%
 
ç11.4%
 
½11.4%
 
ë11.4%
 
²11.4%
 
ư11.4%
 
î11.4%
 
ș11.4%
 
ų11.4%
 
Other values (9)912.5%
 

Most frequent Katakana characters

ValueCountFrequency (%) 
310.7%
 
27.1%
 
27.1%
 
27.1%
 
27.1%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 
13.6%
 

Most frequent CJK characters

ValueCountFrequency (%) 
53.1%
 
42.5%
 
42.5%
 
31.9%
 
31.9%
 
31.9%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
21.2%
 
Other values (98)10062.5%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
о1814.2%
 
а107.9%
 
е107.9%
 
р97.1%
 
н86.3%
 
л75.5%
 
и64.7%
 
в64.7%
 
С53.9%
 
д43.1%
 
к43.1%
 
б43.1%
 
г43.1%
 
с32.4%
 
з32.4%
 
я21.6%
 
т21.6%
 
ы21.6%
 
у21.6%
 
п21.6%
 
ц10.8%
 
Б10.8%
 
З10.8%
 
ё10.8%
 
М10.8%
 
Other values (11)118.7%
 

Most frequent Hiragana characters

ValueCountFrequency (%) 
741.2%
 
211.8%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 

Most frequent Hangul characters

ValueCountFrequency (%) 
36.7%
 
24.4%
 
24.4%
 
24.4%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
12.2%
 
Other values (15)1533.3%
 

Most frequent Number Forms characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Tamil characters

ValueCountFrequency (%) 
323.1%
 
215.4%
 
215.4%
 
17.7%
 
17.7%
 
17.7%
 
17.7%
 
17.7%
 
17.7%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
342.9%
 
228.6%
 
114.3%
 
114.3%
 

Most frequent Devanagari characters

ValueCountFrequency (%) 
514.3%
 
411.4%
 
38.6%
 
25.7%
 
25.7%
 
ि25.7%
 
25.7%
 
25.7%
 
25.7%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 
12.9%
 

Most frequent Thai characters

ValueCountFrequency (%) 
211.8%
 
211.8%
 
211.8%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 
15.9%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ا717.9%
 
ی615.4%
 
ن410.3%
 
س37.7%
 
ر37.7%
 
د37.7%
 
ه37.7%
 
ب25.1%
 
م25.1%
 
ك12.6%
 
ت12.6%
 
ج12.6%
 
ز12.6%
 
چ12.6%
 
آ12.6%
 

Most frequent Latin Ext Additional characters

ValueCountFrequency (%) 
1100.0%
 

overview
Categorical

HIGH CARDINALITY
UNIFORM

Distinct4801
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
 
3
It's been many years since Freddy Krueger's first victim, Nancy, came face-to-face with Freddy and his sadistic, evil ways. Now, Nancy's all grown up; she's put her frightening nightmares behind her and is helping teens cope with their dreams. Too bad Freddy's decided to herald his return by invading the kids' dreams and scaring them into committing suicide.
 
1
A Savage beast, grown to monstrous size and driven mad by toxic wastes that are poisoning the waters, spreads terror and death on a Maine countryside.
 
1
The characters we met a little more than a decade ago are returning to East Great Falls for their high-school reunion. In one long-overdue weekend, they will discover what has changed, who hasn’t and that time and distance can’t break the bonds of friendship. It was summer 1999 when four small-town Michigan boys began a quest to lose their virginity. In the years that have passed, Jim and Michelle married while Kevin and Vicky said goodbye. Oz and Heather grew apart, but Finch still longs for Stifler’s mom. Now these lifelong friends have come home as adults to reminisce about – and get inspired by – the hormonal teens who launched a comedy legend.
 
1
Arrogant, self-centered movie director Guido Contini finds himself struggling to find meaning, purpose, and a script for his latest film endeavor. With only a week left before shooting begins, he desperately searches for answers and inspiration from his wife, his mistress, his muse, and his mother.
 
1
Other values (4796)
4796 
ValueCountFrequency (%) 
UNK30.1%
 
It's been many years since Freddy Krueger's first victim, Nancy, came face-to-face with Freddy and his sadistic, evil ways. Now, Nancy's all grown up; she's put her frightening nightmares behind her and is helping teens cope with their dreams. Too bad Freddy's decided to herald his return by invading the kids' dreams and scaring them into committing suicide.1< 0.1%
 
A Savage beast, grown to monstrous size and driven mad by toxic wastes that are poisoning the waters, spreads terror and death on a Maine countryside.1< 0.1%
 
The characters we met a little more than a decade ago are returning to East Great Falls for their high-school reunion. In one long-overdue weekend, they will discover what has changed, who hasn’t and that time and distance can’t break the bonds of friendship. It was summer 1999 when four small-town Michigan boys began a quest to lose their virginity. In the years that have passed, Jim and Michelle married while Kevin and Vicky said goodbye. Oz and Heather grew apart, but Finch still longs for Stifler’s mom. Now these lifelong friends have come home as adults to reminisce about – and get inspired by – the hormonal teens who launched a comedy legend.1< 0.1%
 
Arrogant, self-centered movie director Guido Contini finds himself struggling to find meaning, purpose, and a script for his latest film endeavor. With only a week left before shooting begins, he desperately searches for answers and inspiration from his wife, his mistress, his muse, and his mother.1< 0.1%
 
On the night of 16 July 1942, ten year old Sarah and her parents are being arrested and transported to the Velodrome d'Hiver in Paris where thousands of other jews are being send to get deported. Sarah however managed to lock her little brother in a closed just before the police entered their appartment.Sixty years later, Julia Jarmond, an American journalist in Paris, gets the assignment to write an article about this raid, a black page in the history of France. She starts digging archives and through Sarah's file discovers a well kept secret about her own in-laws.1< 0.1%
 
The true story of British athletes preparing for and competing in the 1924 Summer Olympics.1< 0.1%
 
Summertime on the coast of Maine, "In the Bedroom" centers on the inner dynamics of a family in transition. Matt Fowler is a doctor practicing in his native Maine and is married to New York born Ruth Fowler, a music teacher. He is involved in a love affair with a local single mother. As the beauty of Maine's brief and fleeting summer comes to an end, these characters find themselves in the midst of unimaginable tragedy.1< 0.1%
 
We always knew they were coming back. Using recovered alien technology, the nations of Earth have collaborated on an immense defense program to protect the planet. But nothing can prepare us for the aliens’ advanced and unprecedented force. Only the ingenuity of a few brave men and women can bring our world back from the brink of extinction.1< 0.1%
 
Manhattanite Ashley is known to many as the luckiest woman around. After a chance encounter with a down-and-out young man, however, she realizes that she's swapped her fortune for his.1< 0.1%
 
Medical researcher Frank, his fiancee Zoe and their team have achieved the impossible: they have found a way to revive the dead. After a successful, but unsanctioned, experiment on a lifeless animal, they are ready to make their work public. However, when their dean learns what they've done, he shuts them down. Zoe is killed during an attempt to recreate the experiment, leading Frank to test the process on her. Zoe is revived -- but something evil is within her.1< 0.1%
 
Jason Kelly is one week away from marrying his boss's uber-controlling daughter, putting him on the fast track for a partnership at the law firm. However, when the straight-laced Jason is tricked into driving his foul-mouthed grandfather, Dick, to Daytona for spring break, his pending nuptials are suddenly in jeopardy. Between riotous frat parties, bar fights, and an epic night of karaoke, Dick is on a quest to live his life to the fullest and bring Jason along for the ride.1< 0.1%
 
A sparkling comedic chronicle of a middle-class young man’s romantic misadventures among New York City’s debutante society. Stillman’s deft, literate dialogue and hilariously highbrow observations earned this debut film an Academy Award nomination for Best Original Screenplay. Alongside the wit and sophistication, though, lies a tender tale of adolescent anxiety.1< 0.1%
 
A rich college kid is taught a lesson after a joy ride ends up destroying a country restaurant.1< 0.1%
 
In Colombia just after the Great War, an old man falls from a ladder; dying, he professes great love for his wife. After the funeral, a man calls on the widow - she dismisses him angrily. Flash back more than 50 years to the day Florentino Ariza, a telegraph boy, falls in love with Fermina Daza, the daughter of a mule trader.1< 0.1%
 
In a beauty salon in Beirut the lives of five women cross paths. The beauty salon is a colorful and sensual microcosm where they share and entrust their hopes, fears and expectations.1< 0.1%
 
When The Man in the Yellow Hat befriends Curious George in the jungle, they set off on a non-stop, fun-filled journey through the wonders of the big city toward the warmth of true friendship.1< 0.1%
 
Set in the late 19th century. When a ruthless robber baron takes away everything they cherish, a rough-and-tumble, idealistic peasant and a sophisticated heiress embark on a quest for justice, vengeance…and a few good heists.1< 0.1%
 
A timid magazine photo manager who lives life vicariously through daydreams embarks on a true-life adventure when a negative goes missing.1< 0.1%
 
Katherine Morrissey, a former Christian missionary, lost her faith after the tragic deaths of her family. Now she applies her expertise to debunking religious phenomena. When a series of biblical plagues overrun a small town, Katherine arrives to prove that a supernatural force is not behind the occurrences, but soon finds that science cannot explain what is happening. Instead, she must regain her faith to combat the evil that waits in a Louisiana swamp.1< 0.1%
 
Raju, a waiter, is in love with the famous TV reporter Greeta Kapoor. After a man is murdered, Kapoor shows up at Raju's door to ask him some questions - it turns out that Raju served the dead man his last supper, and the authorities hope that he might be able to help them. Raju lies and says that he was an eye witness, in order to spend more time with Kapoor. He gives the police a false description of the killer, but it matches his best friend Kutti, so soon Kutti is wanted by the police, and the Mafia, who is responsible for the killing, is after Raju.1< 0.1%
 
A law firm brings in its 'fixer' to remedy the situation after a lawyer has a breakdown while representing a chemical company that he knows is guilty in a multi-billion dollar class action suit.1< 0.1%
 
When Dustin's girlfriend, Alexis, breaks up with him, he employs his best buddy, Tank, to take her out on the worst rebound date imaginable in the hopes that it will send her running back into his arms. But when Tank begins to really fall for Alexis, he finds himself in an impossible position.1< 0.1%
 
Fugitives of the Federation for their daring rescue of Spock from the doomed Genesis Planet, Admiral Kirk (William Shatner) and his crew begin their journey home to face justice for their actions. But as they near Earth, they find it at the mercy of a mysterious alien presence whose signals are slowly destroying the planet. In a desperate attempt to answer the call of the probe, Kirk and his crew race back to the late twentieth century. However they soon find the world they once knew to be more alien than anything they've encountered in the far reaches of the galaxy!1< 0.1%
 
Pete Sandidge (Tracy), a daredevil bomber pilot, dies when he crashes his plane into a German aircraft carrier, leaving his devoted girlfriend, Dorinda (Irene Dunne), who is also a pilot, heartbroken. In heaven, Pete receives a new assignment: he is to become the guardian angel for Ted Randall (Van Johnson), a young Army flyer. Invisibly, Pete guides Ted through flight school and into combat, but the ectoplasmic mentor's tolerance is tested when Ted falls for Dorinda. Ultimately, however, Pete not only comes to terms with their relationship but also acts as Dorinda's copilot when she undertakes a dangerous bombing raid, so that Ted won't have to. Remade by Steven Speilberg in 1989 as ALWAYS1< 0.1%
 
Other values (4776)477699.4%
 
2020-12-16T17:46:23.538222image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4800 ?
Unique (%)99.9%
2020-12-16T17:46:23.825095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1000
Median length283
Mean length305.2098688
Min length1

Overview of Unicode Properties

Unique unicode characters127
Unique unicode categories19 ?
Unique unicode scripts2 ?
Unique unicode blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
24571216.8%
 
e1411789.6%
 
t971046.6%
 
a949076.5%
 
i853295.8%
 
o843735.8%
 
n840155.7%
 
s783435.3%
 
r771545.3%
 
h619484.2%
 
l486813.3%
 
d415522.8%
 
c324772.2%
 
u298382.0%
 
m282401.9%
 
f263981.8%
 
g254301.7%
 
y203871.4%
 
p201081.4%
 
w191081.3%
 
b158141.1%
 
,133880.9%
 
v127910.9%
 
.120560.8%
 
k91500.6%
 
Other values (102)604424.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter113984577.8%
 
Space Separator24571616.8%
 
Uppercase Letter390482.7%
 
Other Punctuation309882.1%
 
Dash Punctuation44740.3%
 
Decimal Number39150.3%
 
Open Punctuation7570.1%
 
Close Punctuation7540.1%
 
Final Punctuation310< 0.1%
 
Initial Punctuation49< 0.1%
 
Currency Symbol46< 0.1%
 
Math Symbol5< 0.1%
 
Other Symbol4< 0.1%
 
Connector Punctuation3< 0.1%
 
Format3< 0.1%
 
Control3< 0.1%
 
Other Number1< 0.1%
 
Modifier Letter1< 0.1%
 
Modifier Symbol1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A444811.4%
 
T32688.4%
 
S29347.5%
 
B26716.8%
 
C24676.3%
 
M23075.9%
 
W20985.4%
 
H18084.6%
 
D16824.3%
 
I16044.1%
 
J15914.1%
 
R14603.7%
 
L13943.6%
 
P13213.4%
 
F13153.4%
 
E12093.1%
 
N11613.0%
 
G10522.7%
 
K8472.2%
 
O6881.8%
 
V5351.4%
 
U4891.3%
 
Y4131.1%
 
Z1280.3%
 
Q1100.3%
 
Other values (3)480.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e14117812.4%
 
t971048.5%
 
a949078.3%
 
i853297.5%
 
o843737.4%
 
n840157.4%
 
s783436.9%
 
r771546.8%
 
h619485.4%
 
l486814.3%
 
d415523.6%
 
c324772.8%
 
u298382.6%
 
m282402.5%
 
f263982.3%
 
g254302.2%
 
y203871.8%
 
p201081.8%
 
w191081.7%
 
b158141.4%
 
v127911.1%
 
k91500.8%
 
x21130.2%
 
j12990.1%
 
z11960.1%
 
Other values (21)9120.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
245712> 99.9%
 
 4< 0.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
192623.7%
 
082121.0%
 
956814.5%
 
23719.5%
 
52315.9%
 
72245.7%
 
82065.3%
 
31995.1%
 
41894.8%
 
61804.6%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,1338843.2%
 
.1205638.9%
 
'357711.5%
 
"10213.3%
 
:2840.9%
 
?2180.7%
 
;1750.6%
 
!1420.5%
 
/570.2%
 
350.1%
 
&290.1%
 
·2< 0.1%
 
#2< 0.1%
 
¡1< 0.1%
 
%1< 0.1%
 

Most frequent Final Punctuation characters

ValueCountFrequency (%) 
27588.7%
 
3511.3%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-420694.0%
 
2054.6%
 
621.4%
 
1< 0.1%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(75599.7%
 
[20.3%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)75399.9%
 
]10.1%
 

Most frequent Initial Punctuation characters

ValueCountFrequency (%) 
3571.4%
 
1428.6%
 

Most frequent Other Number characters

ValueCountFrequency (%) 
¹1100.0%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$4597.8%
 
£12.2%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+240.0%
 
120.0%
 
|120.0%
 
~120.0%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
®375.0%
 
¦125.0%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_3100.0%
 

Most frequent Format characters

ValueCountFrequency (%) 
­3100.0%
 

Most frequent Control characters

ValueCountFrequency (%) 
3100.0%
 

Most frequent Modifier Letter characters

ValueCountFrequency (%) 
ʼ1100.0%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin117889380.4%
 
Common28703019.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e14117812.0%
 
t971048.2%
 
a949078.1%
 
i853297.2%
 
o843737.2%
 
n840157.1%
 
s783436.6%
 
r771546.5%
 
h619485.3%
 
l486814.1%
 
d415523.5%
 
c324772.8%
 
u298382.5%
 
m282402.4%
 
f263982.2%
 
g254302.2%
 
y203871.7%
 
p201081.7%
 
w191081.6%
 
b158141.3%
 
v127911.1%
 
k91500.8%
 
A44480.4%
 
T32680.3%
 
S29340.2%
 
Other values (49)339182.9%
 

Most frequent Common characters

ValueCountFrequency (%) 
24571285.6%
 
,133884.7%
 
.120564.2%
 
-42061.5%
 
'35771.2%
 
"10210.4%
 
19260.3%
 
08210.3%
 
(7550.3%
 
)7530.3%
 
95680.2%
 
23710.1%
 
:2840.1%
 
2750.1%
 
52310.1%
 
72240.1%
 
?2180.1%
 
82060.1%
 
2050.1%
 
31990.1%
 
41890.1%
 
61800.1%
 
;1750.1%
 
!142< 0.1%
 
62< 0.1%
 
Other values (28)2860.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII146508899.9%
 
Punctuation662< 0.1%
 
None170< 0.1%
 
Alphabetic PF1< 0.1%
 
Math Operators1< 0.1%
 
Modifier Letters1< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
24571216.8%
 
e1411789.6%
 
t971046.6%
 
a949076.5%
 
i853295.8%
 
o843735.8%
 
n840155.7%
 
s783435.3%
 
r771545.3%
 
h619484.2%
 
l486813.3%
 
d415522.8%
 
c324772.2%
 
u298382.0%
 
m282401.9%
 
f263981.8%
 
g254301.7%
 
y203871.4%
 
p201081.4%
 
w191081.3%
 
b158141.1%
 
,133880.9%
 
v127910.9%
 
.120560.8%
 
k91500.6%
 
Other values (62)596074.1%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
27541.5%
 
20531.0%
 
629.4%
 
355.3%
 
355.3%
 
355.3%
 
142.1%
 
10.2%
 

Most frequent None characters

ValueCountFrequency (%) 
é9254.1%
 
á127.1%
 
ó84.7%
 
 42.4%
 
ï42.4%
 
è42.4%
 
ç42.4%
 
ö42.4%
 
í42.4%
 
î31.8%
 
ü31.8%
 
à31.8%
 
ñ31.8%
 
®31.8%
 
­31.8%
 
·21.2%
 
ø21.2%
 
¹10.6%
 
ô10.6%
 
ë10.6%
 
Æ10.6%
 
Â10.6%
 
¡10.6%
 
¦10.6%
 
£10.6%
 
Other values (4)42.4%
 

Most frequent Alphabetic PF characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Math Operators characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Modifier Letters characters

ValueCountFrequency (%) 
ʼ1100.0%
 

popularity
Real number (ℝ≥0)

Distinct4802
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21.49230059
Minimum0
Maximum875.581305
Zeros1
Zeros (%)< 0.1%
Memory size37.6 KiB
2020-12-16T17:46:24.168358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.3628167
Q14.66807
median12.921594
Q328.3135045
95-th percentile67.3859622
Maximum875.581305
Range875.581305
Interquartile range (IQR)23.6454345

Descriptive statistics

Standard deviation31.81664975
Coefficient of variation (CV)1.480374314
Kurtosis191.9958205
Mean21.49230059
Median Absolute Deviation (MAD)9.814445
Skewness9.721415886
Sum103227.5197
Variance1012.299201
MonotocityNot monotonic
2020-12-16T17:46:24.503808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
8.9021022< 0.1%
 
16.2512041< 0.1%
 
18.472421< 0.1%
 
9.7794441< 0.1%
 
10.1422181< 0.1%
 
1.5692461< 0.1%
 
27.655271< 0.1%
 
121.4630761< 0.1%
 
16.0325941< 0.1%
 
0.1183241< 0.1%
 
80.1712831< 0.1%
 
36.2389681< 0.1%
 
24.6579311< 0.1%
 
44.5294291< 0.1%
 
8.2653171< 0.1%
 
10.4399711< 0.1%
 
93.0678661< 0.1%
 
55.6599881< 0.1%
 
33.6496521< 0.1%
 
4.2890031< 0.1%
 
0.8878211< 0.1%
 
1.717291< 0.1%
 
39.0045881< 0.1%
 
19.5249721< 0.1%
 
1.7771481< 0.1%
 
Other values (4777)477799.5%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0003721< 0.1%
 
0.0011171< 0.1%
 
0.0011861< 0.1%
 
0.0013891< 0.1%
 
0.0015861< 0.1%
 
0.0023861< 0.1%
 
0.0023881< 0.1%
 
0.0031421< 0.1%
 
0.0033521< 0.1%
 
ValueCountFrequency (%) 
875.5813051< 0.1%
 
724.2477841< 0.1%
 
514.5699561< 0.1%
 
481.0986241< 0.1%
 
434.2785641< 0.1%
 
418.7085521< 0.1%
 
271.9728891< 0.1%
 
243.7917431< 0.1%
 
206.2271511< 0.1%
 
203.734591< 0.1%
 

production_companies
Categorical

HIGH CARDINALITY

Distinct3697
Distinct (%)77.0%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
[]
 
351
[{'name': 'Paramount Pictures', 'id': 4}]
 
58
[{'name': 'Universal Pictures', 'id': 33}]
 
45
[{'name': 'New Line Cinema', 'id': 12}]
 
38
[{'name': 'Columbia Pictures', 'id': 5}]
 
37
Other values (3692)
4274 
ValueCountFrequency (%) 
[]3517.3%
 
[{'name': 'Paramount Pictures', 'id': 4}]581.2%
 
[{'name': 'Universal Pictures', 'id': 33}]450.9%
 
[{'name': 'New Line Cinema', 'id': 12}]380.8%
 
[{'name': 'Columbia Pictures', 'id': 5}]370.8%
 
[{'name': 'Metro-Goldwyn-Mayer (MGM)', 'id': 8411}]320.7%
 
[{'name': 'Twentieth Century Fox Film Corporation', 'id': 306}]310.6%
 
[{'name': 'Warner Bros.', 'id': 6194}]270.6%
 
[{'name': 'Walt Disney Pictures', 'id': 2}]270.6%
 
[{'name': 'Touchstone Pictures', 'id': 9195}]260.5%
 
[{'name': 'Dimension Films', 'id': 7405}]170.4%
 
[{'name': 'Miramax Films', 'id': 14}]160.3%
 
[{'name': 'Columbia Pictures Corporation', 'id': 441}]160.3%
 
[{'name': 'DreamWorks Animation', 'id': 521}]120.2%
 
[{'name': 'United Artists', 'id': 60}]120.2%
 
[{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Pixar Animation Studios', 'id': 3}]110.2%
 
[{'name': 'Fox 2000 Pictures', 'id': 711}]100.2%
 
[{'name': 'Fox Searchlight Pictures', 'id': 43}]90.2%
 
[{'name': 'Imagine Entertainment', 'id': 23}, {'name': 'Universal Pictures', 'id': 33}]90.2%
 
[{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Walt Disney Feature Animation', 'id': 10217}]90.2%
 
[{'name': 'Lions Gate Films', 'id': 35}]80.2%
 
[{'name': 'Blue Sky Studios', 'id': 9383}, {'name': 'Twentieth Century Fox Animation', 'id': 11749}]80.2%
 
[{'name': 'Marvel Studios', 'id': 420}]80.2%
 
[{'name': 'United Artists', 'id': 60}, {'name': 'Eon Productions', 'id': 7576}, {'name': 'Danjaq', 'id': 10761}]70.1%
 
[{'name': 'Hollywood Pictures', 'id': 915}, {'name': 'Cinergi Pictures Entertainment', 'id': 1504}]70.1%
 
Other values (3672)397282.7%
 
2020-12-16T17:46:24.777653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3497 ?
Unique (%)72.8%
2020-12-16T17:46:25.377784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1130
Median length104
Mean length127.542994
Min length2

Overview of Unicode Properties

Unique unicode characters106
Unique unicode categories11 ?
Unique unicode scripts2 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
'8199313.4%
 
7093511.6%
 
i345265.6%
 
e338415.5%
 
n326055.3%
 
a283124.6%
 
:273554.5%
 
,229763.8%
 
m222543.6%
 
d198593.2%
 
t183863.0%
 
r171812.8%
 
o155602.5%
 
{136772.2%
 
}136772.2%
 
s127622.1%
 
l93471.5%
 
u92951.5%
 
184241.4%
 
c75571.2%
 
264541.1%
 
P61671.0%
 
359351.0%
 
455200.9%
 
549410.8%
 
Other values (81)8305013.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter27871945.5%
 
Other Punctuation13364621.8%
 
Space Separator7093511.6%
 
Decimal Number533658.7%
 
Uppercase Letter374426.1%
 
Open Punctuation189333.1%
 
Close Punctuation189333.1%
 
Dash Punctuation5000.1%
 
Math Symbol113< 0.1%
 
Other Number2< 0.1%
 
Other Symbol1< 0.1%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
{1367772.2%
 
[480325.4%
 
(4532.4%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'8199361.4%
 
:2735520.5%
 
,2297617.2%
 
.8350.6%
 
/1760.1%
 
&1610.1%
 
"1420.1%
 
!4< 0.1%
 
@2< 0.1%
 
?1< 0.1%
 
%1< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
i3452612.4%
 
e3384112.1%
 
n3260511.7%
 
a2831210.2%
 
m222548.0%
 
d198597.1%
 
t183866.6%
 
r171816.2%
 
o155605.6%
 
s127624.6%
 
l93473.4%
 
u92953.3%
 
c75572.7%
 
y26481.0%
 
h25200.9%
 
p23870.9%
 
g20990.8%
 
v14780.5%
 
k13960.5%
 
w13010.5%
 
b12060.4%
 
f7020.3%
 
x6990.3%
 
é2940.1%
 
z1950.1%
 
Other values (18)3090.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
70935100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P616716.5%
 
F443611.8%
 
C37009.9%
 
M24296.5%
 
E23376.2%
 
S22416.0%
 
T16444.4%
 
B15664.2%
 
A14273.8%
 
G14093.8%
 
D13533.6%
 
W12523.3%
 
I12233.3%
 
R11563.1%
 
L11143.0%
 
N7632.0%
 
H6621.8%
 
U5611.5%
 
V5471.5%
 
K5161.4%
 
O3811.0%
 
J2290.6%
 
Z1400.4%
 
Y880.2%
 
Q450.1%
 
Other values (5)560.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1842415.8%
 
2645412.1%
 
3593511.1%
 
4552010.3%
 
549419.3%
 
646838.8%
 
044988.4%
 
743468.1%
 
842918.0%
 
942738.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
}1367772.2%
 
]480325.4%
 
)4532.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-500100.0%
 

Most frequent Other Number characters

ValueCountFrequency (%) 
²150.0%
 
½150.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+113100.0%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
°1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin31616151.6%
 
Common29642848.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
'8199327.7%
 
7093523.9%
 
:273559.2%
 
,229767.8%
 
{136774.6%
 
}136774.6%
 
184242.8%
 
264542.2%
 
359352.0%
 
455201.9%
 
549411.7%
 
[48031.6%
 
]48031.6%
 
646831.6%
 
044981.5%
 
743461.5%
 
842911.4%
 
942731.4%
 
.8350.3%
 
-5000.2%
 
(4530.2%
 
)4530.2%
 
/1760.1%
 
&1610.1%
 
"142< 0.1%
 
Other values (8)124< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
i3452610.9%
 
e3384110.7%
 
n3260510.3%
 
a283129.0%
 
m222547.0%
 
d198596.3%
 
t183865.8%
 
r171815.4%
 
o155604.9%
 
s127624.0%
 
l93473.0%
 
u92952.9%
 
c75572.4%
 
P61672.0%
 
F44361.4%
 
C37001.2%
 
y26480.8%
 
h25200.8%
 
M24290.8%
 
p23870.8%
 
E23370.7%
 
S22410.7%
 
g20990.7%
 
T16440.5%
 
B15660.5%
 
Other values (48)205026.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII61213699.9%
 
None4530.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
'8199313.4%
 
7093511.6%
 
i345265.6%
 
e338415.5%
 
n326055.3%
 
a283124.6%
 
:273554.5%
 
,229763.8%
 
m222543.6%
 
d198593.2%
 
t183863.0%
 
r171812.8%
 
o155602.5%
 
{136772.2%
 
}136772.2%
 
s127622.1%
 
l93471.5%
 
u92951.5%
 
184241.4%
 
c75571.2%
 
264541.1%
 
P61671.0%
 
359351.0%
 
455200.9%
 
549410.8%
 
Other values (57)8259713.5%
 

Most frequent None characters

ValueCountFrequency (%) 
é29464.9%
 
ó306.6%
 
í173.8%
 
ö163.5%
 
ñ153.3%
 
è122.6%
 
á122.6%
 
ä112.4%
 
É102.2%
 
ü102.2%
 
ô40.9%
 
ç40.9%
 
ã30.7%
 
ú30.7%
 
à20.4%
 
õ20.4%
 
²10.2%
 
ï10.2%
 
Î10.2%
 
°10.2%
 
Ö10.2%
 
½10.2%
 
ě10.2%
 
Á10.2%
 

production_countries
Categorical

HIGH CARDINALITY

Distinct469
Distinct (%)9.8%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
[{'iso_3166_1': 'US', 'name': 'United States of America'}]
2977 
[{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]
 
181
[]
 
174
[{'iso_3166_1': 'GB', 'name': 'United Kingdom'}]
 
131
[{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]
 
119
Other values (464)
1221 
ValueCountFrequency (%) 
[{'iso_3166_1': 'US', 'name': 'United States of America'}]297762.0%
 
[{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]1813.8%
 
[]1743.6%
 
[{'iso_3166_1': 'GB', 'name': 'United Kingdom'}]1312.7%
 
[{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]1192.5%
 
[{'iso_3166_1': 'CA', 'name': 'Canada'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]881.8%
 
[{'iso_3166_1': 'FR', 'name': 'France'}]491.0%
 
[{'iso_3166_1': 'AU', 'name': 'Australia'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]461.0%
 
[{'iso_3166_1': 'CA', 'name': 'Canada'}]461.0%
 
[{'iso_3166_1': 'FR', 'name': 'France'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]380.8%
 
[{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]330.7%
 
[{'iso_3166_1': 'IN', 'name': 'India'}]240.5%
 
[{'iso_3166_1': 'AU', 'name': 'Australia'}]210.4%
 
[{'iso_3166_1': 'FR', 'name': 'France'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]170.4%
 
[{'iso_3166_1': 'JP', 'name': 'Japan'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]150.3%
 
[{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'DE', 'name': 'Germany'}]150.3%
 
[{'iso_3166_1': 'DE', 'name': 'Germany'}]150.3%
 
[{'iso_3166_1': 'JP', 'name': 'Japan'}]150.3%
 
[{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}]150.3%
 
[{'iso_3166_1': 'FR', 'name': 'France'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}]140.3%
 
[{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'CA', 'name': 'Canada'}]140.3%
 
[{'iso_3166_1': 'CA', 'name': 'Canada'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}]130.3%
 
[{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'AU', 'name': 'Australia'}]120.2%
 
[{'iso_3166_1': 'CN', 'name': 'China'}, {'iso_3166_1': 'HK', 'name': 'Hong Kong'}]110.2%
 
[{'iso_3166_1': 'KR', 'name': 'South Korea'}]100.2%
 
Other values (444)71014.8%
 
2020-12-16T17:46:25.662138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique353 ?
Unique (%)7.3%
2020-12-16T17:46:26.046650image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length517
Median length58
Mean length69.92129919
Min length2

Overview of Unicode Properties

Unique unicode characters61
Unique unicode categories8 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
'5148815.3%
 
3379310.1%
 
e199926.0%
 
a168205.0%
 
i161874.8%
 
n131653.9%
 
_128723.8%
 
1128723.8%
 
6128723.8%
 
:128723.8%
 
t128193.8%
 
m114483.4%
 
o112943.4%
 
s106013.2%
 
U87172.6%
 
,82432.5%
 
S81752.4%
 
{64361.9%
 
364361.9%
 
}64361.9%
 
d57081.7%
 
r49641.5%
 
[48031.4%
 
]48031.4%
 
A45501.4%
 
Other values (36)174665.2%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter13387839.9%
 
Other Punctuation7260321.6%
 
Space Separator3379310.1%
 
Decimal Number321809.6%
 
Uppercase Letter280288.3%
 
Connector Punctuation128723.8%
 
Open Punctuation112393.3%
 
Close Punctuation112393.3%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
{643657.3%
 
[480342.7%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'5148870.9%
 
:1287217.7%
 
,824311.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1999214.9%
 
a1682012.6%
 
i1618712.1%
 
n131659.8%
 
t128199.6%
 
m114488.6%
 
o112948.4%
 
s106017.9%
 
d57084.3%
 
r49643.7%
 
c43853.3%
 
f39773.0%
 
g8050.6%
 
y4340.3%
 
l4010.3%
 
u2760.2%
 
p1620.1%
 
h1580.1%
 
w850.1%
 
z60< 0.1%
 
b60< 0.1%
 
x41< 0.1%
 
k29< 0.1%
 
v6< 0.1%
 
j1< 0.1%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_12872100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
11287240.0%
 
61287240.0%
 
3643620.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
33793100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
U871731.1%
 
S817529.2%
 
A455016.2%
 
G9793.5%
 
K8022.9%
 
B7352.6%
 
C7232.6%
 
F6252.2%
 
E5111.8%
 
R4431.6%
 
D3701.3%
 
I3671.3%
 
N2360.8%
 
H1500.5%
 
J1240.4%
 
T1110.4%
 
Z1030.4%
 
M870.3%
 
P870.3%
 
L600.2%
 
O320.1%
 
X300.1%
 
W6< 0.1%
 
Y5< 0.1%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
}643657.3%
 
]480342.7%
 

Most occurring scripts

ValueCountFrequency (%) 
Common17392651.8%
 
Latin16190648.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
'5148829.6%
 
3379319.4%
 
_128727.4%
 
1128727.4%
 
6128727.4%
 
:128727.4%
 
,82434.7%
 
{64363.7%
 
364363.7%
 
}64363.7%
 
[48032.8%
 
]48032.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1999212.3%
 
a1682010.4%
 
i1618710.0%
 
n131658.1%
 
t128197.9%
 
m114487.1%
 
o112947.0%
 
s106016.5%
 
U87175.4%
 
S81755.0%
 
d57083.5%
 
r49643.1%
 
A45502.8%
 
c43852.7%
 
f39772.5%
 
G9790.6%
 
g8050.5%
 
K8020.5%
 
B7350.5%
 
C7230.4%
 
F6250.4%
 
E5110.3%
 
R4430.3%
 
y4340.3%
 
l4010.2%
 
Other values (24)26461.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII335832100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
'5148815.3%
 
3379310.1%
 
e199926.0%
 
a168205.0%
 
i161874.8%
 
n131653.9%
 
_128723.8%
 
1128723.8%
 
6128723.8%
 
:128723.8%
 
t128193.8%
 
m114483.4%
 
o112943.4%
 
s106013.2%
 
U87172.6%
 
,82432.5%
 
S81752.4%
 
{64361.9%
 
364361.9%
 
}64361.9%
 
d57081.7%
 
r49641.5%
 
[48031.4%
 
]48031.4%
 
A45501.4%
 
Other values (36)174665.2%
 

release_date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct3281
Distinct (%)68.3%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
2006-01-01
 
10
2002-01-01
 
8
2014-12-25
 
7
1999-10-22
 
7
2013-07-18
 
7
Other values (3276)
4764 
ValueCountFrequency (%) 
2006-01-01100.2%
 
2002-01-0180.2%
 
2014-12-2570.1%
 
1999-10-2270.1%
 
2013-07-1870.1%
 
2004-09-0370.1%
 
2007-01-0160.1%
 
2011-09-3060.1%
 
2011-09-1660.1%
 
2015-10-1660.1%
 
2005-01-0160.1%
 
2005-09-1660.1%
 
2003-01-0160.1%
 
2010-01-0150.1%
 
2008-01-0150.1%
 
1998-12-2550.1%
 
2001-09-0750.1%
 
1999-10-0850.1%
 
2006-09-0150.1%
 
2014-04-1650.1%
 
2008-10-1050.1%
 
2002-12-1350.1%
 
2005-05-1350.1%
 
2006-08-1150.1%
 
2000-09-0850.1%
 
Other values (3256)465596.9%
 
2020-12-16T17:46:26.323983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2265 ?
Unique (%)47.2%
2020-12-16T17:46:26.548262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length9.998542578
Min length3

Overview of Unicode Properties

Unique unicode characters14
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
01192624.8%
 
-960420.0%
 
1761315.9%
 
2673314.0%
 
935837.5%
 
315323.2%
 
815213.2%
 
514503.0%
 
614112.9%
 
413332.8%
 
713142.7%
 
U1< 0.1%
 
N1< 0.1%
 
K1< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number3841680.0%
 
Dash Punctuation960420.0%
 
Uppercase Letter3< 0.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
01192631.0%
 
1761319.8%
 
2673317.5%
 
935839.3%
 
315324.0%
 
815214.0%
 
514503.8%
 
614113.7%
 
413333.5%
 
713143.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-9604100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
U133.3%
 
N133.3%
 
K133.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common48020> 99.9%
 
Latin3< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
01192624.8%
 
-960420.0%
 
1761315.9%
 
2673314.0%
 
935837.5%
 
315323.2%
 
815213.2%
 
514503.0%
 
614112.9%
 
413332.8%
 
713142.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
U133.3%
 
N133.3%
 
K133.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII48023100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
01192624.8%
 
-960420.0%
 
1761315.9%
 
2673314.0%
 
935837.5%
 
315323.2%
 
815213.2%
 
514503.0%
 
614112.9%
 
413332.8%
 
713142.7%
 
U1< 0.1%
 
N1< 0.1%
 
K1< 0.1%
 

gross
Real number (ℝ≥0)

ZEROS

Distinct3297
Distinct (%)68.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82260638.65
Minimum0
Maximum2787965087
Zeros1427
Zeros (%)29.7%
Memory size37.6 KiB
2020-12-16T17:46:26.763380image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median19170001
Q392917187
95-th percentile369284902.7
Maximum2787965087
Range2787965087
Interquartile range (IQR)92917187

Descriptive statistics

Standard deviation162857100.9
Coefficient of variation (CV)1.979769469
Kurtosis33.12362966
Mean82260638.65
Median Absolute Deviation (MAD)19170001
Skewness4.444716448
Sum3.950978474e+11
Variance2.652243533e+16
MonotocityNot monotonic
2020-12-16T17:46:26.995518image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0142729.7%
 
700000060.1%
 
800000060.1%
 
600000050.1%
 
1200000050.1%
 
1000000050.1%
 
10000000050.1%
 
1400000040.1%
 
2500000040.1%
 
1100000040.1%
 
500000040.1%
 
3200000030.1%
 
1300000030.1%
 
6000000030.1%
 
780000030.1%
 
1440000030.1%
 
400000030.1%
 
1700000030.1%
 
3000000030.1%
 
770000002< 0.1%
 
200000002< 0.1%
 
290000002< 0.1%
 
420000002< 0.1%
 
22000002< 0.1%
 
85000002< 0.1%
 
Other values (3272)329268.5%
 
ValueCountFrequency (%) 
0142729.7%
 
51< 0.1%
 
72< 0.1%
 
101< 0.1%
 
112< 0.1%
 
122< 0.1%
 
131< 0.1%
 
141< 0.1%
 
151< 0.1%
 
161< 0.1%
 
ValueCountFrequency (%) 
27879650871< 0.1%
 
18450341881< 0.1%
 
15195579101< 0.1%
 
15135288101< 0.1%
 
15062493601< 0.1%
 
14054036941< 0.1%
 
12742190091< 0.1%
 
12154399941< 0.1%
 
11567309621< 0.1%
 
11533044951< 0.1%
 

duration
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size150.3 KiB

spoken_languages
Categorical

HIGH CARDINALITY

Distinct544
Distinct (%)11.3%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
[{'iso_639_1': 'en', 'name': 'English'}]
3171 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}]
 
127
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}]
 
114
[]
 
86
[{'iso_639_1': 'es', 'name': 'Español'}, {'iso_639_1': 'en', 'name': 'English'}]
 
54
Other values (539)
1251 
ValueCountFrequency (%) 
[{'iso_639_1': 'en', 'name': 'English'}]317166.0%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}]1272.6%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}]1142.4%
 
[]861.8%
 
[{'iso_639_1': 'es', 'name': 'Español'}, {'iso_639_1': 'en', 'name': 'English'}]541.1%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'de', 'name': 'Deutsch'}]531.1%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'it', 'name': 'Italiano'}]511.1%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'ru', 'name': 'Pусский'}]501.0%
 
[{'iso_639_1': 'fr', 'name': 'Français'}]491.0%
 
[{'iso_639_1': 'es', 'name': 'Español'}]230.5%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'ja', 'name': '日本語'}]230.5%
 
[{'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'en', 'name': 'English'}]230.5%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'pt', 'name': 'Português'}]220.5%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'pl', 'name': 'Polski'}]220.5%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'zh', 'name': '普通话'}]190.4%
 
[{'iso_639_1': 'it', 'name': 'Italiano'}, {'iso_639_1': 'en', 'name': 'English'}]170.4%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'th', 'name': 'ภาษาไทย'}]150.3%
 
[{'iso_639_1': 'hi', 'name': 'हिन्दी'}]150.3%
 
[{'iso_639_1': 'cs', 'name': 'Český'}, {'iso_639_1': 'en', 'name': 'English'}]140.3%
 
[{'iso_639_1': 'de', 'name': 'Deutsch'}]140.3%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'de', 'name': 'Deutsch'}]140.3%
 
[{'iso_639_1': 'zh', 'name': '普通话'}]130.3%
 
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'it', 'name': 'Italiano'}]120.2%
 
[{'iso_639_1': 'ru', 'name': 'Pусский'}]120.2%
 
[{'iso_639_1': 'de', 'name': 'Deutsch'}, {'iso_639_1': 'en', 'name': 'English'}]120.2%
 
Other values (519)77816.2%
 
2020-12-16T17:46:27.292023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique416 ?
Unique (%)8.7%
2020-12-16T17:46:27.549783image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length350
Median length40
Mean length57.68394753
Min length2

Overview of Unicode Properties

Unique unicode characters177
Unique unicode categories11 ?
Unique unicode scripts15 ?
Unique unicode blocks15 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
'5549620.0%
 
231518.4%
 
n167686.1%
 
_138745.0%
 
:138745.0%
 
s132214.8%
 
i125374.5%
 
e124864.5%
 
,91573.3%
 
a90553.3%
 
o77122.8%
 
m69692.5%
 
{69372.5%
 
669372.5%
 
369372.5%
 
969372.5%
 
169372.5%
 
}69372.5%
 
l52591.9%
 
h50471.8%
 
E48401.7%
 
[48031.7%
 
]48031.7%
 
g46411.7%
 
r13550.5%
 
Other values (152)103863.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter10128536.6%
 
Other Punctuation7861228.4%
 
Decimal Number2774810.0%
 
Space Separator231518.4%
 
Connector Punctuation138745.0%
 
Open Punctuation117404.2%
 
Close Punctuation117404.2%
 
Uppercase Letter63362.3%
 
Other Letter23010.8%
 
Nonspacing Mark1560.1%
 
Spacing Mark113< 0.1%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
{693759.1%
 
[480340.9%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'5549670.6%
 
:1387417.6%
 
,915711.6%
 
/790.1%
 
?6< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n1676816.6%
 
s1322113.1%
 
i1253712.4%
 
e1248612.3%
 
a90558.9%
 
o77127.6%
 
m69696.9%
 
l52595.2%
 
h50475.0%
 
g46414.6%
 
r13551.3%
 
t9200.9%
 
u6680.7%
 
p4900.5%
 
f4670.5%
 
ç4550.4%
 
с3820.4%
 
c3530.3%
 
ñ3510.3%
 
d3070.3%
 
k2350.2%
 
к2090.2%
 
и2000.2%
 
й1940.2%
 
у1850.2%
 
Other values (44)8190.8%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_13874100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
6693725.0%
 
3693725.0%
 
9693725.0%
 
1693725.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
23151100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
E484076.4%
 
F4376.9%
 
P3064.8%
 
D2764.4%
 
I1883.0%
 
L540.9%
 
M420.7%
 
Č380.6%
 
T350.6%
 
N250.4%
 
V170.3%
 
R130.2%
 
S120.2%
 
У90.1%
 
K80.1%
 
A70.1%
 
G70.1%
 
Í50.1%
 
B50.1%
 
H40.1%
 
Z40.1%
 
C3< 0.1%
 
W1< 0.1%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
}693759.1%
 
]480340.9%
 

Most frequent Other Letter characters

ValueCountFrequency (%) 
1556.7%
 
1074.7%
 
1074.7%
 
974.2%
 
974.2%
 
974.2%
 
964.2%
 
ا944.1%
 
ر944.1%
 
803.5%
 
ل672.9%
 
ع672.9%
 
ب672.9%
 
ي672.9%
 
ة672.9%
 
482.1%
 
482.1%
 
482.1%
 
广482.1%
 
482.1%
 
482.1%
 
401.7%
 
401.7%
 
401.7%
 
401.7%
 
Other values (31)49421.5%
 

Most frequent Spacing Mark characters

ValueCountFrequency (%) 
ि4842.5%
 
4842.5%
 
ி43.5%
 
43.5%
 
43.5%
 
21.8%
 
21.8%
 
10.9%
 

Most frequent Nonspacing Mark characters

ValueCountFrequency (%) 
ִ6642.3%
 
4830.8%
 
ְ3321.2%
 
42.6%
 
42.6%
 
10.6%
 

Most occurring scripts

ValueCountFrequency (%) 
Common16686560.2%
 
Latin10619638.3%
 
Cyrillic12580.5%
 
Han9000.3%
 
Arabic5970.2%
 
Devanagari2880.1%
 
Thai2800.1%
 
Hebrew2640.1%
 
Hangul1860.1%
 
Greek1600.1%
 
Gurmukhi24< 0.1%
 
Tamil20< 0.1%
 
Georgian7< 0.1%
 
Telugu6< 0.1%
 
Bengali5< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
'5549633.3%
 
2315113.9%
 
_138748.3%
 
:138748.3%
 
,91575.5%
 
{69374.2%
 
669374.2%
 
369374.2%
 
969374.2%
 
169374.2%
 
}69374.2%
 
[48032.9%
 
]48032.9%
 
/79< 0.1%
 
?6< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n1676815.8%
 
s1322112.4%
 
i1253711.8%
 
e1248611.8%
 
a90558.5%
 
o77127.3%
 
m69696.6%
 
l52595.0%
 
h50474.8%
 
E48404.6%
 
g46414.4%
 
r13551.3%
 
t9200.9%
 
u6680.6%
 
p4900.5%
 
f4670.4%
 
ç4550.4%
 
F4370.4%
 
c3530.3%
 
ñ3510.3%
 
d3070.3%
 
P3060.3%
 
D2760.3%
 
k2350.2%
 
I1880.2%
 
Other values (35)8530.8%
 

Most frequent Greek characters

ValueCountFrequency (%) 
λ4025.0%
 
ε2012.5%
 
η2012.5%
 
ν2012.5%
 
ι2012.5%
 
κ2012.5%
 
ά2012.5%
 

Most frequent Han characters

ValueCountFrequency (%) 
15517.2%
 
10711.9%
 
10711.9%
 
9710.8%
 
9710.8%
 
9710.8%
 
9610.7%
 
广485.3%
 
485.3%
 
485.3%
 

Most frequent Thai characters

ValueCountFrequency (%) 
8028.6%
 
4014.3%
 
4014.3%
 
4014.3%
 
4014.3%
 
4014.3%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
с38230.4%
 
к20916.6%
 
и20015.9%
 
й19415.4%
 
у18514.7%
 
а161.3%
 
р121.0%
 
У90.7%
 
ї90.7%
 
н90.7%
 
ь90.7%
 
з50.4%
 
қ40.3%
 
б30.2%
 
ъ30.2%
 
л30.2%
 
г30.2%
 
е30.2%
 

Most frequent Devanagari characters

ValueCountFrequency (%) 
4816.7%
 
ि4816.7%
 
4816.7%
 
4816.7%
 
4816.7%
 
4816.7%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ا9415.7%
 
ر9415.7%
 
ل6711.2%
 
ع6711.2%
 
ب6711.2%
 
ي6711.2%
 
ة6711.2%
 
و172.8%
 
د152.5%
 
ف122.0%
 
س122.0%
 
ی122.0%
 
پ20.3%
 
ښ20.3%
 
ت20.3%
 

Most frequent Hangul characters

ValueCountFrequency (%) 
3116.7%
 
3116.7%
 
3116.7%
 
3116.7%
 
3116.7%
 
3116.7%
 

Most frequent Tamil characters

ValueCountFrequency (%) 
420.0%
 
420.0%
 
ி420.0%
 
420.0%
 
420.0%
 

Most frequent Hebrew characters

ValueCountFrequency (%) 
ִ6625.0%
 
ע3312.5%
 
ב3312.5%
 
ְ3312.5%
 
ר3312.5%
 
י3312.5%
 
ת3312.5%
 

Most frequent Gurmukhi characters

ValueCountFrequency (%) 
416.7%
 
416.7%
 
416.7%
 
416.7%
 
416.7%
 
416.7%
 

Most frequent Telugu characters

ValueCountFrequency (%) 
233.3%
 
116.7%
 
116.7%
 
116.7%
 
116.7%
 

Most frequent Georgian characters

ValueCountFrequency (%) 
114.3%
 
114.3%
 
114.3%
 
114.3%
 
114.3%
 
114.3%
 
114.3%
 

Most frequent Bengali characters

ValueCountFrequency (%) 
240.0%
 
120.0%
 
120.0%
 
120.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII27202398.2%
 
Cyrillic12580.5%
 
None11640.4%
 
CJK9000.3%
 
Arabic5970.2%
 
Devanagari2880.1%
 
Thai2800.1%
 
Hebrew2640.1%
 
Hangul1860.1%
 
Latin Ext Additional34< 0.1%
 
Gurmukhi24< 0.1%
 
Tamil20< 0.1%
 
Georgian7< 0.1%
 
Telugu6< 0.1%
 
Bengali5< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
'5549620.4%
 
231518.5%
 
n167686.2%
 
_138745.1%
 
:138745.1%
 
s132214.9%
 
i125374.6%
 
e124864.6%
 
,91573.4%
 
a90553.3%
 
o77122.8%
 
m69692.6%
 
{69372.6%
 
669372.6%
 
369372.6%
 
969372.6%
 
169372.6%
 
}69372.6%
 
l52591.9%
 
h50471.9%
 
E48401.8%
 
[48031.8%
 
]48031.8%
 
g46411.7%
 
r13550.5%
 
Other values (36)53532.0%
 

Most frequent None characters

ValueCountFrequency (%) 
ç45539.1%
 
ñ35130.2%
 
ê685.8%
 
λ403.4%
 
Č383.3%
 
ý383.3%
 
ε201.7%
 
η201.7%
 
ν201.7%
 
ι201.7%
 
κ201.7%
 
ά201.7%
 
ü181.5%
 
â131.1%
 
ă131.1%
 
Í50.4%
 
č30.3%
 
à10.1%
 
š10.1%
 

Most frequent CJK characters

ValueCountFrequency (%) 
15517.2%
 
10711.9%
 
10711.9%
 
9710.8%
 
9710.8%
 
9710.8%
 
9610.7%
 
广485.3%
 
485.3%
 
485.3%
 

Most frequent Thai characters

ValueCountFrequency (%) 
8028.6%
 
4014.3%
 
4014.3%
 
4014.3%
 
4014.3%
 
4014.3%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
с38230.4%
 
к20916.6%
 
и20015.9%
 
й19415.4%
 
у18514.7%
 
а161.3%
 
р121.0%
 
У90.7%
 
ї90.7%
 
н90.7%
 
ь90.7%
 
з50.4%
 
қ40.3%
 
б30.2%
 
ъ30.2%
 
л30.2%
 
г30.2%
 
е30.2%
 

Most frequent Devanagari characters

ValueCountFrequency (%) 
4816.7%
 
ि4816.7%
 
4816.7%
 
4816.7%
 
4816.7%
 
4816.7%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ا9415.7%
 
ر9415.7%
 
ل6711.2%
 
ع6711.2%
 
ب6711.2%
 
ي6711.2%
 
ة6711.2%
 
و172.8%
 
د152.5%
 
ف122.0%
 
س122.0%
 
ی122.0%
 
پ20.3%
 
ښ20.3%
 
ت20.3%
 

Most frequent Hangul characters

ValueCountFrequency (%) 
3116.7%
 
3116.7%
 
3116.7%
 
3116.7%
 
3116.7%
 
3116.7%
 

Most frequent Tamil characters

ValueCountFrequency (%) 
420.0%
 
420.0%
 
ி420.0%
 
420.0%
 
420.0%
 

Most frequent Hebrew characters

ValueCountFrequency (%) 
ִ6625.0%
 
ע3312.5%
 
ב3312.5%
 
ְ3312.5%
 
ר3312.5%
 
י3312.5%
 
ת3312.5%
 

Most frequent Latin Ext Additional characters

ValueCountFrequency (%) 
ế1750.0%
 
1750.0%
 

Most frequent Gurmukhi characters

ValueCountFrequency (%) 
416.7%
 
416.7%
 
416.7%
 
416.7%
 
416.7%
 
416.7%
 

Most frequent Telugu characters

ValueCountFrequency (%) 
233.3%
 
116.7%
 
116.7%
 
116.7%
 
116.7%
 

Most frequent Georgian characters

ValueCountFrequency (%) 
114.3%
 
114.3%
 
114.3%
 
114.3%
 
114.3%
 
114.3%
 
114.3%
 

Most frequent Bengali characters

ValueCountFrequency (%) 
240.0%
 
120.0%
 
120.0%
 
120.0%
 

status
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
Released
4795 
Rumored
 
5
Post Production
 
3
ValueCountFrequency (%) 
Released479599.8%
 
Rumored50.1%
 
Post Production30.1%
 
2020-12-16T17:46:27.804245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-16T17:46:27.933552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:28.091059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length15
Median length8
Mean length8.003331251
Min length7

Overview of Unicode Properties

Unique unicode characters16
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e1439037.4%
 
d480312.5%
 
R480012.5%
 
s479812.5%
 
l479512.5%
 
a479512.5%
 
o14< 0.1%
 
r8< 0.1%
 
u8< 0.1%
 
P6< 0.1%
 
t6< 0.1%
 
m5< 0.1%
 
3< 0.1%
 
c3< 0.1%
 
i3< 0.1%
 
n3< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter3363187.5%
 
Uppercase Letter480612.5%
 
Space Separator3< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
R480099.9%
 
P60.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1439042.8%
 
d480314.3%
 
s479814.3%
 
l479514.3%
 
a479514.3%
 
o14< 0.1%
 
r8< 0.1%
 
u8< 0.1%
 
t6< 0.1%
 
m5< 0.1%
 
c3< 0.1%
 
i3< 0.1%
 
n3< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
3100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin38437> 99.9%
 
Common3< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1439037.4%
 
d480312.5%
 
R480012.5%
 
s479812.5%
 
l479512.5%
 
a479512.5%
 
o14< 0.1%
 
r8< 0.1%
 
u8< 0.1%
 
P6< 0.1%
 
t6< 0.1%
 
m5< 0.1%
 
c3< 0.1%
 
i3< 0.1%
 
n3< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
3100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII38440100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e1439037.4%
 
d480312.5%
 
R480012.5%
 
s479812.5%
 
l479512.5%
 
a479512.5%
 
o14< 0.1%
 
r8< 0.1%
 
u8< 0.1%
 
P6< 0.1%
 
t6< 0.1%
 
m5< 0.1%
 
3< 0.1%
 
c3< 0.1%
 
i3< 0.1%
 
n3< 0.1%
 

tagline
Categorical

HIGH CARDINALITY

Distinct3945
Distinct (%)82.1%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
844 
Based on a true story.
 
3
There are two sides to every love story.
 
2
The only way out is down.
 
2
From zero to hero.
 
2
Other values (3940)
3950 
ValueCountFrequency (%) 
UNK84417.6%
 
Based on a true story.30.1%
 
There are two sides to every love story.2< 0.1%
 
The only way out is down.2< 0.1%
 
From zero to hero.2< 0.1%
 
You never forget your first love.2< 0.1%
 
Who's next?2< 0.1%
 
Who is John Galt?2< 0.1%
 
Worlds Collide2< 0.1%
 
One ordinary couple. One little white lie.2< 0.1%
 
Be careful what you wish for.2< 0.1%
 
Based on the incredible true story.2< 0.1%
 
There are no clean getaways.2< 0.1%
 
One way in. No way out.2< 0.1%
 
What could go wrong?2< 0.1%
 
The ball is back!1< 0.1%
 
One person can change your life forever1< 0.1%
 
Underworld1< 0.1%
 
Believe The Unbelievable1< 0.1%
 
Suffering? You Haven't Seen Anything Yet...1< 0.1%
 
The Most Seductive Evil of All Time Has Now Been Unleashed in Ours.1< 0.1%
 
Gangway...For This Years BIG Adventure!1< 0.1%
 
Be careful what you wish for...1< 0.1%
 
What would you go back for?1< 0.1%
 
She's the one in every family.1< 0.1%
 
Other values (3920)392081.6%
 
2020-12-16T17:46:28.335536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3930 ?
Unique (%)81.8%
2020-12-16T17:46:28.606452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length252
Median length32
Mean length35.13762232
Min length3

Overview of Unicode Properties

Unique unicode characters92
Unique unicode categories11 ?
Unique unicode scripts3 ?
Unique unicode blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
2692616.0%
 
e1745010.3%
 
o104636.2%
 
t103586.1%
 
a87615.2%
 
n84065.0%
 
i81374.8%
 
r79334.7%
 
s76484.5%
 
h65873.9%
 
.51473.0%
 
l51323.0%
 
d39512.3%
 
u36692.2%
 
y31151.8%
 
m30891.8%
 
g27341.6%
 
c26601.6%
 
f24461.4%
 
w22521.3%
 
v18831.1%
 
p15720.9%
 
b15680.9%
 
T14920.9%
 
N12740.8%
 
Other values (67)141138.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter12132171.9%
 
Space Separator2692616.0%
 
Uppercase Letter122327.2%
 
Other Punctuation75634.5%
 
Decimal Number5210.3%
 
Dash Punctuation1500.1%
 
Final Punctuation28< 0.1%
 
Open Punctuation8< 0.1%
 
Close Punctuation8< 0.1%
 
Other Letter5< 0.1%
 
Currency Symbol4< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
T149212.2%
 
N127410.4%
 
U9377.7%
 
K9317.6%
 
A8276.8%
 
S7085.8%
 
I5994.9%
 
W5924.8%
 
H5854.8%
 
B4783.9%
 
F4193.4%
 
E4113.4%
 
L4003.3%
 
O3773.1%
 
C3542.9%
 
D3322.7%
 
M3322.7%
 
Y2972.4%
 
R2412.0%
 
G2381.9%
 
P2341.9%
 
J1040.9%
 
V500.4%
 
Z110.1%
 
Q6< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1745014.4%
 
o104638.6%
 
t103588.5%
 
a87617.2%
 
n84066.9%
 
i81376.7%
 
r79336.5%
 
s76486.3%
 
h65875.4%
 
l51324.2%
 
d39513.3%
 
u36693.0%
 
y31152.6%
 
m30892.5%
 
g27342.3%
 
c26602.2%
 
f24462.0%
 
w22521.9%
 
v18831.6%
 
p15721.3%
 
b15681.3%
 
k10250.8%
 
x1950.2%
 
j1610.1%
 
z710.1%
 
Other values (3)55< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
26926100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.514768.1%
 
'103913.7%
 
,7259.6%
 
!3564.7%
 
?2202.9%
 
"200.3%
 
:140.2%
 
100.1%
 
&90.1%
 
%90.1%
 
;50.1%
 
#50.1%
 
*3< 0.1%
 
/1< 0.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-14999.3%
 
10.7%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
016231.1%
 
19919.0%
 
25510.6%
 
9387.3%
 
3377.1%
 
7285.4%
 
5275.2%
 
4265.0%
 
6254.8%
 
8244.6%
 

Most frequent Final Punctuation characters

ValueCountFrequency (%) 
28100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(787.5%
 
[112.5%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)787.5%
 
]112.5%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$4100.0%
 

Most frequent Other Letter characters

ValueCountFrequency (%) 
120.0%
 
120.0%
 
120.0%
 
120.0%
 
120.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin13355379.1%
 
Common3520820.9%
 
Han5< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1745013.1%
 
o104637.8%
 
t103587.8%
 
a87616.6%
 
n84066.3%
 
i81376.1%
 
r79335.9%
 
s76485.7%
 
h65874.9%
 
l51323.8%
 
d39513.0%
 
u36692.7%
 
y31152.3%
 
m30892.3%
 
g27342.0%
 
c26602.0%
 
f24461.8%
 
w22521.7%
 
v18831.4%
 
p15721.2%
 
b15681.2%
 
T14921.1%
 
N12741.0%
 
k10250.8%
 
U9370.7%
 
Other values (29)90116.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
2692676.5%
 
.514714.6%
 
'10393.0%
 
,7252.1%
 
!3561.0%
 
?2200.6%
 
01620.5%
 
-1490.4%
 
1990.3%
 
2550.2%
 
9380.1%
 
3370.1%
 
280.1%
 
7280.1%
 
5270.1%
 
4260.1%
 
6250.1%
 
8240.1%
 
"200.1%
 
:14< 0.1%
 
10< 0.1%
 
&9< 0.1%
 
%9< 0.1%
 
(7< 0.1%
 
)7< 0.1%
 
Other values (8)210.1%
 

Most frequent Han characters

ValueCountFrequency (%) 
120.0%
 
120.0%
 
120.0%
 
120.0%
 
120.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII168720> 99.9%
 
Punctuation39< 0.1%
 
CJK5< 0.1%
 
None2< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
2692616.0%
 
e1745010.3%
 
o104636.2%
 
t103586.1%
 
a87615.2%
 
n84065.0%
 
i81374.8%
 
r79334.7%
 
s76484.5%
 
h65873.9%
 
.51473.1%
 
l51323.0%
 
d39512.3%
 
u36692.2%
 
y31151.8%
 
m30891.8%
 
g27341.6%
 
c26601.6%
 
f24461.4%
 
w22521.3%
 
v18831.1%
 
p15720.9%
 
b15680.9%
 
T14920.9%
 
N12740.8%
 
Other values (57)140678.3%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
2871.8%
 
1025.6%
 
12.6%
 

Most frequent None characters

ValueCountFrequency (%) 
á150.0%
 
é150.0%
 

Most frequent CJK characters

ValueCountFrequency (%) 
120.0%
 
120.0%
 
120.0%
 
120.0%
 
120.0%
 

movie_title
Categorical

HIGH CARDINALITY
UNIFORM

Distinct4800
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
Batman
 
2
Out of the Blue
 
2
The Host
 
2
Slither
 
1
Appaloosa
 
1
Other values (4795)
4795 
ValueCountFrequency (%) 
Batman2< 0.1%
 
Out of the Blue2< 0.1%
 
The Host2< 0.1%
 
Slither1< 0.1%
 
Appaloosa1< 0.1%
 
Wimbledon1< 0.1%
 
Knocked Up1< 0.1%
 
Fever Pitch1< 0.1%
 
Miracle at St. Anna1< 0.1%
 
Bandidas1< 0.1%
 
The Benchwarmers1< 0.1%
 
The Bounty Hunter1< 0.1%
 
The Warlords1< 0.1%
 
Boyhood1< 0.1%
 
Killer Elite1< 0.1%
 
Kansas City1< 0.1%
 
The Grudge1< 0.1%
 
The Monkey King 21< 0.1%
 
Love & Basketball1< 0.1%
 
Kung Pow: Enter the Fist1< 0.1%
 
Krrish1< 0.1%
 
R.I.P.D.1< 0.1%
 
Miss Congeniality1< 0.1%
 
Like Crazy1< 0.1%
 
Unfriended1< 0.1%
 
Other values (4775)477599.4%
 
2020-12-16T17:46:28.894048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4797 ?
Unique (%)99.9%
2020-12-16T17:46:29.175953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length86
Median length14
Mean length15.34915678
Min length1

Overview of Unicode Properties

Unique unicode characters98
Unique unicode categories14 ?
Unique unicode scripts2 ?
Unique unicode blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
855311.6%
 
e752510.2%
 
a46326.3%
 
o44706.1%
 
n39505.4%
 
r39465.4%
 
i37655.1%
 
t36605.0%
 
s28623.9%
 
h28523.9%
 
l24173.3%
 
d17842.4%
 
T16682.3%
 
u15062.0%
 
c11801.6%
 
g11581.6%
 
y11201.5%
 
m10601.4%
 
S10071.4%
 
f8591.2%
 
M8001.1%
 
B7561.0%
 
p6900.9%
 
D6870.9%
 
C6630.9%
 
Other values (73)1015213.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter5190770.4%
 
Uppercase Letter1174815.9%
 
Space Separator855311.6%
 
Other Punctuation9091.2%
 
Decimal Number4940.7%
 
Dash Punctuation820.1%
 
Open Punctuation7< 0.1%
 
Close Punctuation7< 0.1%
 
Other Number4< 0.1%
 
Currency Symbol4< 0.1%
 
Final Punctuation3< 0.1%
 
Math Symbol2< 0.1%
 
Connector Punctuation1< 0.1%
 
Other Symbol1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
T166814.2%
 
S10078.6%
 
M8006.8%
 
B7566.4%
 
D6875.8%
 
C6635.6%
 
A6405.4%
 
L5434.6%
 
H5414.6%
 
W5004.3%
 
P4714.0%
 
G4684.0%
 
R4674.0%
 
F4533.9%
 
I4533.9%
 
E3022.6%
 
N2652.3%
 
O2211.9%
 
J1911.6%
 
K1841.6%
 
V1421.2%
 
Y1251.1%
 
U1121.0%
 
Z410.3%
 
Q260.2%
 
Other values (2)220.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e752514.5%
 
a46328.9%
 
o44708.6%
 
n39507.6%
 
r39467.6%
 
i37657.3%
 
t36607.1%
 
s28625.5%
 
h28525.5%
 
l24174.7%
 
d17843.4%
 
u15062.9%
 
c11802.3%
 
g11582.2%
 
y11202.2%
 
m10602.0%
 
f8591.7%
 
p6901.3%
 
k6371.2%
 
v5801.1%
 
w4830.9%
 
b4560.9%
 
x1390.3%
 
z940.2%
 
j410.1%
 
Other values (8)410.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
8553100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
:35138.6%
 
'22124.3%
 
.14115.5%
 
,758.3%
 
&616.7%
 
!313.4%
 
?171.9%
 
/70.8%
 
#20.2%
 
*20.2%
 
·10.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-8097.6%
 
22.4%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
214629.6%
 
17916.0%
 
07715.6%
 
37214.6%
 
4336.7%
 
8214.3%
 
5214.3%
 
9163.2%
 
7153.0%
 
6142.8%
 

Most frequent Other Number characters

ValueCountFrequency (%) 
³125.0%
 
125.0%
 
½125.0%
 
²125.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(571.4%
 
[228.6%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)571.4%
 
]228.6%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
¢250.0%
 
$250.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+2100.0%
 

Most frequent Final Punctuation characters

ValueCountFrequency (%) 
3100.0%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_1100.0%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
°1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin6365586.3%
 
Common1006713.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e752511.8%
 
a46327.3%
 
o44707.0%
 
n39506.2%
 
r39466.2%
 
i37655.9%
 
t36605.7%
 
s28624.5%
 
h28524.5%
 
l24173.8%
 
d17842.8%
 
T16682.6%
 
u15062.4%
 
c11801.9%
 
g11581.8%
 
y11201.8%
 
m10601.7%
 
S10071.6%
 
f8591.3%
 
M8001.3%
 
B7561.2%
 
p6901.1%
 
D6871.1%
 
C6631.0%
 
A6401.0%
 
Other values (35)799812.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
855385.0%
 
:3513.5%
 
'2212.2%
 
21461.5%
 
.1411.4%
 
-800.8%
 
1790.8%
 
0770.8%
 
,750.7%
 
3720.7%
 
&610.6%
 
4330.3%
 
!310.3%
 
8210.2%
 
5210.2%
 
?170.2%
 
9160.2%
 
7150.1%
 
6140.1%
 
/70.1%
 
(5< 0.1%
 
)5< 0.1%
 
3< 0.1%
 
¢2< 0.1%
 
+2< 0.1%
 
Other values (13)190.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII73696> 99.9%
 
None20< 0.1%
 
Punctuation5< 0.1%
 
Number Forms1< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
855311.6%
 
e752510.2%
 
a46326.3%
 
o44706.1%
 
n39505.4%
 
r39465.4%
 
i37655.1%
 
t36605.0%
 
s28623.9%
 
h28523.9%
 
l24173.3%
 
d17842.4%
 
T16682.3%
 
u15062.0%
 
c11801.6%
 
g11581.6%
 
y11201.5%
 
m10601.4%
 
S10071.4%
 
f8591.2%
 
M8001.1%
 
B7561.0%
 
p6900.9%
 
D6870.9%
 
C6630.9%
 
Other values (56)1012613.7%
 

Most frequent None characters

ValueCountFrequency (%) 
é630.0%
 
¢210.0%
 
·15.0%
 
à15.0%
 
³15.0%
 
Æ15.0%
 
ü15.0%
 
½15.0%
 
ë15.0%
 
²15.0%
 
á15.0%
 
ó15.0%
 
ñ15.0%
 
°15.0%
 

Most frequent Number Forms characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
360.0%
 
240.0%
 

vote_average
Real number (ℝ≥0)

ZEROS

Distinct71
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.092171559
Minimum0
Maximum10
Zeros63
Zeros (%)1.3%
Memory size37.6 KiB
2020-12-16T17:46:29.405405image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.3
Q15.6
median6.2
Q36.8
95-th percentile7.6
Maximum10
Range10
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation1.194612163
Coefficient of variation (CV)0.1960897114
Kurtosis7.792362845
Mean6.092171559
Median Absolute Deviation (MAD)0.6
Skewness-1.959710007
Sum29260.7
Variance1.42709822
MonotocityNot monotonic
2020-12-16T17:46:29.643196image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6.52164.5%
 
62164.5%
 
6.72134.4%
 
6.32074.3%
 
6.12014.2%
 
6.42014.2%
 
6.22004.2%
 
6.61984.1%
 
5.91964.1%
 
5.81873.9%
 
71793.7%
 
6.81723.6%
 
6.91603.3%
 
5.71533.2%
 
5.51523.2%
 
5.61443.0%
 
5.41272.6%
 
7.31252.6%
 
7.11192.5%
 
7.21192.5%
 
7.41092.3%
 
5.31052.2%
 
5.2931.9%
 
5861.8%
 
7.5661.4%
 
Other values (46)85917.9%
 
ValueCountFrequency (%) 
0631.3%
 
0.51< 0.1%
 
12< 0.1%
 
1.91< 0.1%
 
260.1%
 
2.21< 0.1%
 
2.32< 0.1%
 
2.41< 0.1%
 
2.61< 0.1%
 
2.71< 0.1%
 
ValueCountFrequency (%) 
1040.1%
 
9.51< 0.1%
 
9.31< 0.1%
 
8.52< 0.1%
 
8.42< 0.1%
 
8.370.1%
 
8.2150.3%
 
8.1180.4%
 
8350.7%
 
7.9320.7%
 

num_voted_users
Real number (ℝ≥0)

ZEROS

Distinct1609
Distinct (%)33.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean690.2179888
Minimum0
Maximum13752
Zeros62
Zeros (%)1.3%
Memory size37.6 KiB
2020-12-16T17:46:29.874431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q154
median235
Q3737
95-th percentile3040.9
Maximum13752
Range13752
Interquartile range (IQR)683

Descriptive statistics

Standard deviation1234.585891
Coefficient of variation (CV)1.788689821
Kurtosis19.91394618
Mean690.2179888
Median Absolute Deviation (MAD)214
Skewness3.824068535
Sum3315117
Variance1524202.322
MonotocityNot monotonic
2020-12-16T17:46:30.092967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0621.3%
 
1531.1%
 
2461.0%
 
4430.9%
 
3410.9%
 
6380.8%
 
8370.8%
 
10340.7%
 
11320.7%
 
9320.7%
 
7310.6%
 
5280.6%
 
15260.5%
 
19260.5%
 
12260.5%
 
13250.5%
 
16240.5%
 
22230.5%
 
34230.5%
 
31220.5%
 
24220.5%
 
18220.5%
 
17210.4%
 
25200.4%
 
26200.4%
 
Other values (1584)402683.8%
 
ValueCountFrequency (%) 
0621.3%
 
1531.1%
 
2461.0%
 
3410.9%
 
4430.9%
 
5280.6%
 
6380.8%
 
7310.6%
 
8370.8%
 
9320.7%
 
ValueCountFrequency (%) 
137521< 0.1%
 
120021< 0.1%
 
118001< 0.1%
 
117761< 0.1%
 
109951< 0.1%
 
108671< 0.1%
 
100991< 0.1%
 
97421< 0.1%
 
94551< 0.1%
 
94271< 0.1%
 

title_year
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size150.2 KiB

country
Categorical

HIGH CARDINALITY

Distinct71
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
United States of America
3102 
United Kingdom
374 
Canada
 
220
Germany
 
200
UNK
 
174
Other values (66)
733 
ValueCountFrequency (%) 
United States of America310264.6%
 
United Kingdom3747.8%
 
Canada2204.6%
 
Germany2004.2%
 
UNK1743.6%
 
France1743.6%
 
Australia871.8%
 
India420.9%
 
China400.8%
 
Japan340.7%
 
Spain340.7%
 
Italy260.5%
 
Ireland220.5%
 
Mexico220.5%
 
New Zealand220.5%
 
Hong Kong220.5%
 
Czech Republic180.4%
 
Belgium170.4%
 
Denmark140.3%
 
South Korea130.3%
 
Brazil130.3%
 
Russia110.2%
 
Switzerland100.2%
 
Netherlands100.2%
 
South Africa90.2%
 
Other values (46)931.9%
 
2020-12-16T17:46:30.356610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique28 ?
Unique (%)0.6%
2020-12-16T17:46:30.579235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length24
Median length24
Mean length18.37164272
Min length3

Overview of Unicode Properties

Unique unicode characters46
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e1031111.7%
 
t986511.2%
 
978411.1%
 
a78908.9%
 
i73218.3%
 
n47845.4%
 
d42004.8%
 
m37294.2%
 
r37164.2%
 
U36574.1%
 
o36054.1%
 
c33563.8%
 
s32463.7%
 
A32213.7%
 
S31743.6%
 
f31123.5%
 
K5850.7%
 
g4560.5%
 
C2800.3%
 
l2500.3%
 
y2450.3%
 
N2120.2%
 
G2050.2%
 
u1820.2%
 
F1780.2%
 
Other values (21)6750.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter6662375.5%
 
Uppercase Letter1183213.4%
 
Space Separator978411.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
U365730.9%
 
A322127.2%
 
S317426.8%
 
K5854.9%
 
C2802.4%
 
N2121.8%
 
G2051.7%
 
F1781.5%
 
I980.8%
 
B400.3%
 
J370.3%
 
R350.3%
 
H260.2%
 
M260.2%
 
Z220.2%
 
D150.1%
 
E80.1%
 
P5< 0.1%
 
L4< 0.1%
 
T4< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1031115.5%
 
t986514.8%
 
a789011.8%
 
i732111.0%
 
n47847.2%
 
d42006.3%
 
m37295.6%
 
r37165.6%
 
o36055.4%
 
c33565.0%
 
s32464.9%
 
f31124.7%
 
g4560.7%
 
l2500.4%
 
y2450.4%
 
u1820.3%
 
h1000.2%
 
p930.1%
 
z430.1%
 
w420.1%
 
b33< 0.1%
 
x24< 0.1%
 
k16< 0.1%
 
v3< 0.1%
 
j1< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
9784100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin7845588.9%
 
Common978411.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1031113.1%
 
t986512.6%
 
a789010.1%
 
i73219.3%
 
n47846.1%
 
d42005.4%
 
m37294.8%
 
r37164.7%
 
U36574.7%
 
o36054.6%
 
c33564.3%
 
s32464.1%
 
A32214.1%
 
S31744.0%
 
f31124.0%
 
K5850.7%
 
g4560.6%
 
C2800.4%
 
l2500.3%
 
y2450.3%
 
N2120.3%
 
G2050.3%
 
u1820.2%
 
F1780.2%
 
h1000.1%
 
Other values (20)5750.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
9784100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII88239100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e1031111.7%
 
t986511.2%
 
978411.1%
 
a78908.9%
 
i73218.3%
 
n47845.4%
 
d42004.8%
 
m37294.2%
 
r37164.2%
 
U36574.1%
 
o36054.1%
 
c33563.8%
 
s32463.7%
 
A32213.7%
 
S31743.6%
 
f31123.5%
 
K5850.7%
 
g4560.5%
 
C2800.3%
 
l2500.3%
 
y2450.3%
 
N2120.2%
 
G2050.2%
 
u1820.2%
 
F1780.2%
 
Other values (21)6750.8%
 

director_name
Categorical

HIGH CARDINALITY

Distinct2350
Distinct (%)48.9%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
 
30
Steven Spielberg
 
27
Woody Allen
 
21
Clint Eastwood
 
20
Martin Scorsese
 
20
Other values (2345)
4685 
ValueCountFrequency (%) 
UNK300.6%
 
Steven Spielberg270.6%
 
Woody Allen210.4%
 
Clint Eastwood200.4%
 
Martin Scorsese200.4%
 
Spike Lee160.3%
 
Ridley Scott160.3%
 
Robert Rodriguez160.3%
 
Renny Harlin150.3%
 
Steven Soderbergh150.3%
 
Oliver Stone140.3%
 
Tim Burton140.3%
 
Barry Levinson130.3%
 
Joel Schumacher130.3%
 
Robert Zemeckis130.3%
 
Ron Howard130.3%
 
Brian De Palma120.2%
 
Francis Ford Coppola120.2%
 
Michael Bay120.2%
 
Tony Scott120.2%
 
Kevin Smith120.2%
 
Richard Donner110.2%
 
Joel Coen110.2%
 
Chris Columbus110.2%
 
Richard Linklater110.2%
 
Other values (2325)442392.1%
 
2020-12-16T17:46:30.824957image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1475 ?
Unique (%)30.7%
2020-12-16T17:46:31.065443image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length32
Median length13
Mean length13.05579846
Min length3

Overview of Unicode Properties

Unique unicode characters83
Unique unicode categories5 ?
Unique unicode scripts2 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e58979.4%
 
51698.2%
 
a50998.1%
 
n45237.2%
 
r42886.8%
 
o36865.9%
 
i35915.7%
 
l28874.6%
 
t22353.6%
 
s20193.2%
 
h17892.9%
 
d15172.4%
 
c13892.2%
 
m11931.9%
 
u11271.8%
 
y11031.8%
 
S9901.6%
 
k9301.5%
 
J8951.4%
 
M8581.4%
 
g8021.3%
 
R7411.2%
 
C6921.1%
 
B6421.0%
 
v6171.0%
 
Other values (58)802812.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter4697174.9%
 
Uppercase Letter1023016.3%
 
Space Separator51698.2%
 
Other Punctuation2540.4%
 
Dash Punctuation830.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S9909.7%
 
J8958.7%
 
M8588.4%
 
R7417.2%
 
C6926.8%
 
B6426.3%
 
D5875.7%
 
A5305.2%
 
L4884.8%
 
G4684.6%
 
P4654.5%
 
T4374.3%
 
H4054.0%
 
W3833.7%
 
K3573.5%
 
F3533.5%
 
N2682.6%
 
E1771.7%
 
O1041.0%
 
V1011.0%
 
Z910.9%
 
I740.7%
 
U470.5%
 
Y450.4%
 
Q190.2%
 
Other values (7)130.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e589712.6%
 
a509910.9%
 
n45239.6%
 
r42889.1%
 
o36867.8%
 
i35917.6%
 
l28876.1%
 
t22354.8%
 
s20194.3%
 
h17893.8%
 
d15173.2%
 
c13893.0%
 
m11932.5%
 
u11272.4%
 
y11032.3%
 
k9302.0%
 
g8021.7%
 
v6171.3%
 
b5711.2%
 
p4440.9%
 
w4170.9%
 
f3220.7%
 
z2170.5%
 
x680.1%
 
j680.1%
 
Other values (21)1720.4%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
5169100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.22889.8%
 
'218.3%
 
,52.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-83100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin5720191.2%
 
Common55068.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e589710.3%
 
a50998.9%
 
n45237.9%
 
r42887.5%
 
o36866.4%
 
i35916.3%
 
l28875.0%
 
t22353.9%
 
s20193.5%
 
h17893.1%
 
d15172.7%
 
c13892.4%
 
m11932.1%
 
u11272.0%
 
y11031.9%
 
S9901.7%
 
k9301.6%
 
J8951.6%
 
M8581.5%
 
g8021.4%
 
R7411.3%
 
C6921.2%
 
B6421.1%
 
v6171.1%
 
D5871.0%
 
Other values (53)710412.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
516993.9%
 
.2284.1%
 
-831.5%
 
'210.4%
 
,50.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6255299.8%
 
None1550.2%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e58979.4%
 
51698.3%
 
a50998.2%
 
n45237.2%
 
r42886.9%
 
o36865.9%
 
i35915.7%
 
l28874.6%
 
t22353.6%
 
s20193.2%
 
h17892.9%
 
d15172.4%
 
c13892.2%
 
m11931.9%
 
u11271.8%
 
y11031.8%
 
S9901.6%
 
k9301.5%
 
J8951.4%
 
M8581.4%
 
g8021.3%
 
R7411.2%
 
C6921.1%
 
B6421.0%
 
v6171.0%
 
Other values (32)787312.6%
 

Most frequent None characters

ValueCountFrequency (%) 
é3925.2%
 
á2717.4%
 
ó1811.6%
 
ö159.7%
 
í95.8%
 
ñ74.5%
 
å63.9%
 
ç53.2%
 
š42.6%
 
É31.9%
 
ô21.3%
 
Ō21.3%
 
ï21.3%
 
ä21.3%
 
Å21.3%
 
ł21.3%
 
À10.6%
 
ø10.6%
 
ń10.6%
 
û10.6%
 
Á10.6%
 
ř10.6%
 
Ø10.6%
 
æ10.6%
 
ž10.6%
 

actor_1_name
Categorical

HIGH CARDINALITY

Distinct2721
Distinct (%)56.7%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
 
53
Jennifer Aniston
 
15
Morgan Freeman
 
13
Brad Pitt
 
12
Samuel L. Jackson
 
12
Other values (2716)
4698 
ValueCountFrequency (%) 
UNK531.1%
 
Jennifer Aniston150.3%
 
Morgan Freeman130.3%
 
Brad Pitt120.2%
 
Samuel L. Jackson120.2%
 
Robert De Niro110.2%
 
Scarlett Johansson110.2%
 
Alec Baldwin110.2%
 
Diane Keaton110.2%
 
Charlize Theron100.2%
 
Julianne Moore100.2%
 
Josh Hutcherson100.2%
 
Gary Oldman100.2%
 
Matt Damon100.2%
 
Kate Winslet100.2%
 
Philip Seymour Hoffman100.2%
 
Gene Hackman100.2%
 
Ben Kingsley90.2%
 
Colin Firth90.2%
 
Dustin Hoffman90.2%
 
Ewan McGregor90.2%
 
Justin Long90.2%
 
Laurence Fishburne90.2%
 
Drew Barrymore90.2%
 
Gwyneth Paltrow90.2%
 
Other values (2696)450293.7%
 
2020-12-16T17:46:31.601217image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1923 ?
Unique (%)40.0%
2020-12-16T17:46:31.830698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length27
Median length13
Mean length12.98313554
Min length3

Overview of Unicode Properties

Unique unicode characters84
Unique unicode categories5 ?
Unique unicode scripts2 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e59759.6%
 
a56479.1%
 
50658.1%
 
n46267.4%
 
r39036.3%
 
i38236.1%
 
o33715.4%
 
l30854.9%
 
t23513.8%
 
s22413.6%
 
h17852.9%
 
d12662.0%
 
y12552.0%
 
u12492.0%
 
c12021.9%
 
m11771.9%
 
M8721.4%
 
J8501.4%
 
g8011.3%
 
C7691.2%
 
S7521.2%
 
B7191.2%
 
k6551.1%
 
D6261.0%
 
R5830.9%
 
Other values (59)771012.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter4690875.2%
 
Uppercase Letter1017816.3%
 
Space Separator50658.1%
 
Other Punctuation1410.2%
 
Dash Punctuation660.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M8728.6%
 
J8508.4%
 
C7697.6%
 
S7527.4%
 
B7197.1%
 
D6266.2%
 
R5835.7%
 
A5805.7%
 
K5215.1%
 
H4904.8%
 
L4754.7%
 
G4354.3%
 
P4013.9%
 
T3603.5%
 
W3583.5%
 
E2932.9%
 
F2832.8%
 
N2782.7%
 
V1281.3%
 
O1071.1%
 
I760.7%
 
U720.7%
 
Z650.6%
 
Y420.4%
 
Q300.3%
 
Other values (5)130.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e597512.7%
 
a564712.0%
 
n46269.9%
 
r39038.3%
 
i38238.1%
 
o33717.2%
 
l30856.6%
 
t23515.0%
 
s22414.8%
 
h17853.8%
 
d12662.7%
 
y12552.7%
 
u12492.7%
 
c12022.6%
 
m11772.5%
 
g8011.7%
 
k6551.4%
 
b4290.9%
 
v4260.9%
 
f3950.8%
 
w3920.8%
 
p3450.7%
 
z2100.4%
 
x830.2%
 
é580.1%
 
Other values (24)1580.3%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
5065100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.10272.3%
 
'3726.2%
 
"21.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-66100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin5708691.5%
 
Common52728.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e597510.5%
 
a56479.9%
 
n46268.1%
 
r39036.8%
 
i38236.7%
 
o33715.9%
 
l30855.4%
 
t23514.1%
 
s22413.9%
 
h17853.1%
 
d12662.2%
 
y12552.2%
 
u12492.2%
 
c12022.1%
 
m11772.1%
 
M8721.5%
 
J8501.5%
 
g8011.4%
 
C7691.3%
 
S7521.3%
 
B7191.3%
 
k6551.1%
 
D6261.1%
 
R5831.0%
 
A5801.0%
 
Other values (54)692312.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
506596.1%
 
.1021.9%
 
-661.3%
 
'370.7%
 
"2< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6220999.8%
 
None1490.2%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e59759.6%
 
a56479.1%
 
50658.1%
 
n46267.4%
 
r39036.3%
 
i38236.1%
 
o33715.4%
 
l30855.0%
 
t23513.8%
 
s22413.6%
 
h17852.9%
 
d12662.0%
 
y12552.0%
 
u12492.0%
 
c12021.9%
 
m11771.9%
 
M8721.4%
 
J8501.4%
 
g8011.3%
 
C7691.2%
 
S7521.2%
 
B7191.2%
 
k6551.1%
 
D6261.0%
 
R5830.9%
 
Other values (32)756112.2%
 

Most frequent None characters

ValueCountFrequency (%) 
é5838.9%
 
á1711.4%
 
í1510.1%
 
ë106.7%
 
ó85.4%
 
ü42.7%
 
å32.0%
 
ñ32.0%
 
ô32.0%
 
ç32.0%
 
Å21.3%
 
ú21.3%
 
ø21.3%
 
ć21.3%
 
ö21.3%
 
è21.3%
 
ê21.3%
 
ï21.3%
 
Á10.7%
 
î10.7%
 
ā10.7%
 
š10.7%
 
Ó10.7%
 
ś10.7%
 
č10.7%
 
Other values (2)21.3%
 

actor_2_name
Categorical

HIGH CARDINALITY

Distinct3096
Distinct (%)64.5%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
 
63
Marisa Tomei
 
9
Ed Harris
 
9
Cameron Diaz
 
9
John Goodman
 
8
Other values (3091)
4705 
ValueCountFrequency (%) 
UNK631.3%
 
Marisa Tomei90.2%
 
Ed Harris90.2%
 
Cameron Diaz90.2%
 
John Goodman80.2%
 
Josh Brolin80.2%
 
Samuel L. Jackson80.2%
 
Forest Whitaker80.2%
 
Kevin Bacon80.2%
 
Emma Watson80.2%
 
Susan Sarandon80.2%
 
Mark Ruffalo80.2%
 
John Leguizamo70.1%
 
Zooey Deschanel70.1%
 
Woody Harrelson70.1%
 
Jon Voight70.1%
 
Steve Buscemi70.1%
 
Ralph Fiennes70.1%
 
Justin Timberlake70.1%
 
Rosario Dawson70.1%
 
Leslie Mann70.1%
 
Nick Nolte60.1%
 
Denise Richards60.1%
 
Dan Aykroyd60.1%
 
Sam Neill60.1%
 
Other values (3071)456295.0%
 
2020-12-16T17:46:32.086499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2281 ?
Unique (%)47.5%
2020-12-16T17:46:32.312455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length27
Median length13
Mean length12.97085155
Min length3

Overview of Unicode Properties

Unique unicode characters105
Unique unicode categories7 ?
Unique unicode scripts4 ?
Unique unicode blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e58479.4%
 
a57179.2%
 
50108.0%
 
n45807.4%
 
r39076.3%
 
i38616.2%
 
o34415.5%
 
l31685.1%
 
t22353.6%
 
s21753.5%
 
h18423.0%
 
d13462.2%
 
y12462.0%
 
m12262.0%
 
c11601.9%
 
u11501.8%
 
M8901.4%
 
J8121.3%
 
g7651.2%
 
C7521.2%
 
S7481.2%
 
B7441.2%
 
k6421.0%
 
D5850.9%
 
R5340.9%
 
Other values (80)791612.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter4695275.4%
 
Uppercase Letter1016316.3%
 
Space Separator50108.0%
 
Other Punctuation970.2%
 
Dash Punctuation740.1%
 
Decimal Number2< 0.1%
 
Nonspacing Mark1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M8908.8%
 
J8128.0%
 
C7527.4%
 
S7487.4%
 
B7447.3%
 
D5855.8%
 
R5345.3%
 
A5265.2%
 
L5175.1%
 
K5095.0%
 
H4844.8%
 
P4514.4%
 
G4254.2%
 
T3713.7%
 
W3473.4%
 
E3223.2%
 
F2832.8%
 
N2782.7%
 
V1421.4%
 
O1221.2%
 
I920.9%
 
U880.9%
 
Z630.6%
 
Y460.5%
 
Q200.2%
 
Other values (9)120.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e584712.5%
 
a571712.2%
 
n45809.8%
 
r39078.3%
 
i38618.2%
 
o34417.3%
 
l31686.7%
 
t22354.8%
 
s21754.6%
 
h18423.9%
 
d13462.9%
 
y12462.7%
 
m12262.6%
 
c11602.5%
 
u11502.4%
 
g7651.6%
 
k6421.4%
 
b4781.0%
 
v4631.0%
 
p3970.8%
 
w3710.8%
 
f3400.7%
 
z2360.5%
 
x1000.2%
 
é610.1%
 
Other values (38)1980.4%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
5010100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-74100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.7375.3%
 
'2323.7%
 
,11.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
5150.0%
 
0150.0%
 

Most frequent Nonspacing Mark characters

ValueCountFrequency (%) 
́1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin5710491.7%
 
Common51838.3%
 
Cyrillic11< 0.1%
 
Inherited1< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e584710.2%
 
a571710.0%
 
n45808.0%
 
r39076.8%
 
i38616.8%
 
o34416.0%
 
l31685.5%
 
t22353.9%
 
s21753.8%
 
h18423.2%
 
d13462.4%
 
y12462.2%
 
m12262.1%
 
c11602.0%
 
u11502.0%
 
M8901.6%
 
J8121.4%
 
g7651.3%
 
C7521.3%
 
S7481.3%
 
B7441.3%
 
k6421.1%
 
D5851.0%
 
R5340.9%
 
A5260.9%
 
Other values (63)720512.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
501096.7%
 
-741.4%
 
.731.4%
 
'230.4%
 
51< 0.1%
 
01< 0.1%
 
,1< 0.1%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
и327.3%
 
Ю19.1%
 
л19.1%
 
я19.1%
 
С19.1%
 
н19.1%
 
г19.1%
 
р19.1%
 
ь19.1%
 

Most frequent Inherited characters

ValueCountFrequency (%) 
́1100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6211699.7%
 
None1670.3%
 
Cyrillic11< 0.1%
 
Latin Ext Additional4< 0.1%
 
Diacriticals1< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e58479.4%
 
a57179.2%
 
50108.1%
 
n45807.4%
 
r39076.3%
 
i38616.2%
 
o34415.5%
 
l31685.1%
 
t22353.6%
 
s21753.5%
 
h18423.0%
 
d13462.2%
 
y12462.0%
 
m12262.0%
 
c11601.9%
 
u11501.9%
 
M8901.4%
 
J8121.3%
 
g7651.2%
 
C7521.2%
 
S7481.2%
 
B7441.2%
 
k6421.0%
 
D5850.9%
 
R5340.9%
 
Other values (34)773312.4%
 

Most frequent None characters

ValueCountFrequency (%) 
é6136.5%
 
á1810.8%
 
í169.6%
 
ñ95.4%
 
ë84.8%
 
ü84.8%
 
å53.0%
 
ó53.0%
 
ç42.4%
 
ø31.8%
 
è31.8%
 
ö31.8%
 
Å21.2%
 
Đ21.2%
 
ć21.2%
 
ú21.2%
 
à10.6%
 
ș10.6%
 
ä10.6%
 
î10.6%
 
û10.6%
 
ı10.6%
 
ğ10.6%
 
ū10.6%
 
ß10.6%
 
Other values (7)74.2%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
и327.3%
 
Ю19.1%
 
л19.1%
 
я19.1%
 
С19.1%
 
н19.1%
 
г19.1%
 
р19.1%
 
ь19.1%
 

Most frequent Latin Ext Additional characters

ValueCountFrequency (%) 
125.0%
 
125.0%
 
125.0%
 
ế125.0%
 

Most frequent Diacriticals characters

ValueCountFrequency (%) 
́1100.0%
 

actor_3_name
Categorical

HIGH CARDINALITY

Distinct3373
Distinct (%)70.2%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
UNK
 
93
Woody Harrelson
 
10
David Koechner
 
8
Vincent D'Onofrio
 
8
Alfred Molina
 
8
Other values (3368)
4676 
ValueCountFrequency (%) 
UNK931.9%
 
Woody Harrelson100.2%
 
David Koechner80.2%
 
Vincent D'Onofrio80.2%
 
Alfred Molina80.2%
 
Jim Broadbent80.2%
 
Goran Visnjic70.1%
 
Viola Davis70.1%
 
Willem Dafoe70.1%
 
Bill Murray70.1%
 
Christopher Plummer70.1%
 
James Doohan70.1%
 
William H. Macy70.1%
 
Robin Wright70.1%
 
Judi Dench60.1%
 
Sam Shepard60.1%
 
Bill Paxton60.1%
 
Amy Adams60.1%
 
Stanley Tucci60.1%
 
Gabrielle Union60.1%
 
Ving Rhames60.1%
 
Scott Glenn60.1%
 
Maggie Gyllenhaal60.1%
 
Steve Buscemi60.1%
 
John C. Reilly60.1%
 
Other values (3348)454694.6%
 
2020-12-16T17:46:32.566145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2586 ?
Unique (%)53.8%
2020-12-16T17:46:32.830197image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length27
Median length13
Mean length12.98334374
Min length3

Overview of Unicode Properties

Unique unicode characters96
Unique unicode categories7 ?
Unique unicode scripts3 ?
Unique unicode blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e57329.2%
 
a56359.0%
 
50658.1%
 
n45587.3%
 
i40176.4%
 
r38266.1%
 
o34075.5%
 
l32655.2%
 
s22213.6%
 
t21253.4%
 
h17862.9%
 
d13102.1%
 
y12522.0%
 
c12232.0%
 
u11941.9%
 
m11871.9%
 
M8841.4%
 
J8231.3%
 
S7761.2%
 
C7411.2%
 
B7331.2%
 
g7251.2%
 
k6601.1%
 
D6591.1%
 
R6181.0%
 
Other values (71)793712.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter4671374.9%
 
Uppercase Letter1029016.5%
 
Space Separator50658.1%
 
Other Punctuation1980.3%
 
Dash Punctuation810.1%
 
Other Letter10< 0.1%
 
Decimal Number2< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M8848.6%
 
J8238.0%
 
S7767.5%
 
C7417.2%
 
B7337.1%
 
D6596.4%
 
R6186.0%
 
A5745.6%
 
K5004.9%
 
L4934.8%
 
H4494.4%
 
G4414.3%
 
P4274.1%
 
T3743.6%
 
W3723.6%
 
E3103.0%
 
N2692.6%
 
F2382.3%
 
V1461.4%
 
U1271.2%
 
O1221.2%
 
I860.8%
 
Y460.4%
 
Z450.4%
 
Q180.2%
 
Other values (7)190.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e573212.3%
 
a563512.1%
 
n45589.8%
 
i40178.6%
 
r38268.2%
 
o34077.3%
 
l32657.0%
 
s22214.8%
 
t21254.5%
 
h17863.8%
 
d13102.8%
 
y12522.7%
 
c12232.6%
 
u11942.6%
 
m11872.5%
 
g7251.6%
 
k6601.4%
 
b5291.1%
 
v4611.0%
 
p3680.8%
 
w3550.8%
 
f2990.6%
 
z2480.5%
 
x990.2%
 
j680.1%
 
Other values (24)1630.3%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
5065100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.14573.2%
 
'4824.2%
 
,31.5%
 
"21.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-81100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
4150.0%
 
0150.0%
 

Most frequent Other Letter characters

ValueCountFrequency (%) 
ی220.0%
 
م220.0%
 
ا220.0%
 
پ110.0%
 
ن110.0%
 
ع110.0%
 
د110.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin5700391.4%
 
Common53468.6%
 
Arabic10< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e573210.1%
 
a56359.9%
 
n45588.0%
 
i40177.0%
 
r38266.7%
 
o34076.0%
 
l32655.7%
 
s22213.9%
 
t21253.7%
 
h17863.1%
 
d13102.3%
 
y12522.2%
 
c12232.1%
 
u11942.1%
 
m11872.1%
 
M8841.6%
 
J8231.4%
 
S7761.4%
 
C7411.3%
 
B7331.3%
 
g7251.3%
 
k6601.2%
 
D6591.2%
 
R6181.1%
 
A5741.0%
 
Other values (56)707212.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
506594.7%
 
.1452.7%
 
-811.5%
 
'480.9%
 
,30.1%
 
"2< 0.1%
 
41< 0.1%
 
01< 0.1%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ی220.0%
 
م220.0%
 
ا220.0%
 
پ110.0%
 
ن110.0%
 
ع110.0%
 
د110.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6220999.8%
 
None1400.2%
 
Arabic10< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e57329.2%
 
a56359.1%
 
50658.1%
 
n45587.3%
 
i40176.5%
 
r38266.2%
 
o34075.5%
 
l32655.2%
 
s22213.6%
 
t21253.4%
 
h17862.9%
 
d13102.1%
 
y12522.0%
 
c12232.0%
 
u11941.9%
 
m11871.9%
 
M8841.4%
 
J8231.3%
 
S7761.2%
 
C7411.2%
 
B7331.2%
 
g7251.2%
 
k6601.1%
 
D6591.1%
 
R6181.0%
 
Other values (35)778712.5%
 

Most frequent None characters

ValueCountFrequency (%) 
é4935.0%
 
á139.3%
 
í128.6%
 
ë75.0%
 
ñ75.0%
 
å53.6%
 
ó53.6%
 
ö53.6%
 
ç42.9%
 
è42.9%
 
Á32.1%
 
ø32.1%
 
Ó21.4%
 
ō21.4%
 
ä21.4%
 
ń21.4%
 
ü21.4%
 
ć21.4%
 
ș10.7%
 
à10.7%
 
Å10.7%
 
ô10.7%
 
ı10.7%
 
É10.7%
 
č10.7%
 
Other values (4)42.9%
 

Most frequent Arabic characters

ValueCountFrequency (%) 
ی220.0%
 
م220.0%
 
ا220.0%
 
پ110.0%
 
ن110.0%
 
ع110.0%
 
د110.0%
 

Interactions

2020-12-16T17:46:10.058971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:10.300454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:10.529388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:10.831640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:11.204931image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:11.539514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:11.727152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:11.897395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:12.063899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:12.236869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:12.406560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:12.577462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:12.754567image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:12.929729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:13.115694image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:13.312895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:13.483612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:13.796812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:13.972505image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:14.157638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:14.339938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:14.517900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:14.700189image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:14.881652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:15.084269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:15.263716image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:15.427870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:15.602044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:15.773774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:15.941778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:16.129447image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:16.311697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:16.491533image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:16.673523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:16.857969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:17.042918image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-12-16T17:46:33.086077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-16T17:46:33.405986image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-16T17:46:33.674078image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-16T17:46:33.936354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-16T17:46:34.220319image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-16T17:46:17.542444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-16T17:46:19.020041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

Unnamed: 0budgetgenreshomepageidplot_keywordslanguageoriginal_titleoverviewpopularityproduction_companiesproduction_countriesrelease_dategrossdurationspoken_languagesstatustaglinemovie_titlevote_averagenum_voted_userstitle_yearcountrydirector_nameactor_1_nameactor_2_nameactor_3_name
00237000000Action|Adventure|Fantasy|Science Fictionhttp://www.avatarmovie.com/19995culture clash|future|space war|space colony|society|space travel|futuristic|romance|space|alien|tribe|alien planet|cgi|marine|soldier|battle|love affair|anti war|power relations|mind and soul|3dEnglishAvatarIn the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.150.437577[{'name': 'Ingenious Film Partners', 'id': 289}, {'name': 'Twentieth Century Fox Film Corporation', 'id': 306}, {'name': 'Dune Entertainment', 'id': 444}, {'name': 'Lightstorm Entertainment', 'id': 574}][{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}]2009-12-102787965087162[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}]ReleasedEnter the World of Pandora.Avatar7.2118002009United States of AmericaJames CameronZoe SaldanaSigourney WeaverStephen Lang
11300000000Adventure|Fantasy|Actionhttp://disney.go.com/disneypictures/pirates/285ocean|drug abuse|exotic island|east india trading company|love of one's life|traitor|shipwreck|strong woman|ship|alliance|calypso|afterlife|fighter|pirate|swashbuckler|aftercreditsstingerEnglishPirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.139.082615[{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Jerry Bruckheimer Films', 'id': 130}, {'name': 'Second Mate Productions', 'id': 19936}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2007-05-19961000000169[{'iso_639_1': 'en', 'name': 'English'}]ReleasedAt the end of the world, the adventure begins.Pirates of the Caribbean: At World's End6.945002007United States of AmericaGore VerbinskiOrlando BloomKeira KnightleyStellan Skarsgård
22245000000Action|Adventure|Crimehttp://www.sonypictures.com/movies/spectre/206647spy|based on novel|secret agent|sequel|mi6|british secret service|united kingdomFrançaisSpectreA cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.107.376788[{'name': 'Columbia Pictures', 'id': 5}, {'name': 'Danjaq', 'id': 10761}, {'name': 'B24', 'id': 69434}][{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]2015-10-26880674609148[{'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}, {'iso_639_1': 'it', 'name': 'Italiano'}, {'iso_639_1': 'de', 'name': 'Deutsch'}]ReleasedA Plan No One EscapesSpectre6.344662015United KingdomSam MendesChristoph WaltzLéa SeydouxRalph Fiennes
33250000000Action|Crime|Drama|Thrillerhttp://www.thedarkknightrises.com/49026dc comics|crime fighter|terrorist|secret identity|burglar|hostage drama|time bomb|gotham city|vigilante|cover-up|superhero|villainess|tragic hero|terrorism|destruction|catwoman|cat burglar|imax|flood|criminal underworld|batmanEnglishThe Dark Knight RisesFollowing the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.112.312950[{'name': 'Legendary Pictures', 'id': 923}, {'name': 'Warner Bros.', 'id': 6194}, {'name': 'DC Entertainment', 'id': 9993}, {'name': 'Syncopy', 'id': 9996}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2012-07-161084939099165[{'iso_639_1': 'en', 'name': 'English'}]ReleasedThe Legend EndsThe Dark Knight Rises7.691062012United States of AmericaChristopher NolanMichael CaineGary OldmanAnne Hathaway
44260000000Action|Adventure|Science Fictionhttp://movies.disney.com/john-carter49529based on novel|mars|medallion|space travel|princess|alien|steampunk|martian|escape|edgar rice burroughs|alien race|superhuman strength|mars civilization|sword and planet|19th century|3dEnglishJohn CarterJohn Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands.43.926995[{'name': 'Walt Disney Pictures', 'id': 2}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2012-03-07284139100132[{'iso_639_1': 'en', 'name': 'English'}]ReleasedLost in our world, found in another.John Carter6.121242012United States of AmericaAndrew StantonLynn CollinsSamantha MortonWillem Dafoe
55258000000Fantasy|Action|Adventurehttp://www.sonypictures.com/movies/spider-man3/559dual identity|amnesia|sandstorm|love of one's life|forgiveness|spider|wretch|death of a friend|egomania|sand|narcism|hostility|marvel comic|sequel|superhero|revengeEnglishSpider-Man 3The seemingly invincible Spider-Man goes up against an all-new crop of villain – including the shape-shifting Sandman. While Spider-Man’s superpowers are altered by an alien organism, his alter ego, Peter Parker, deals with nemesis Eddie Brock and also gets caught up in a love triangle.115.699814[{'name': 'Columbia Pictures', 'id': 5}, {'name': 'Laura Ziskin Productions', 'id': 326}, {'name': 'Marvel Enterprises', 'id': 19551}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2007-05-01890871626139[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}]ReleasedThe battle within.Spider-Man 35.935762007United States of AmericaSam RaimiKirsten DunstJames FrancoThomas Haden Church
66260000000Animation|Familyhttp://disney.go.com/disneypictures/tangled/38757hostage|magic|horse|fairy tale|musical|princess|animation|tower|blonde woman|selfishness|healing power|based on fairy tale|duringcreditsstinger|healing gift|animal sidekickEnglishTangledWhen the kingdom's most wanted-and most charming-bandit Flynn Rider hides out in a mysterious tower, he's taken hostage by Rapunzel, a beautiful and feisty tower-bound teen with 70 feet of magical, golden hair. Flynn's curious captor, who's looking for her ticket out of the tower where she's been locked away for years, strikes a deal with the handsome thief and the unlikely duo sets off on an action-packed escapade, complete with a super-cop horse, an over-protective chameleon and a gruff gang of pub thugs.48.681969[{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Walt Disney Animation Studios', 'id': 6125}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2010-11-24591794936100[{'iso_639_1': 'en', 'name': 'English'}]ReleasedThey're taking adventure to new lengths.Tangled7.433302010United States of AmericaByron HowardMandy MooreDonna MurphyRon Perlman
77280000000Action|Adventure|Science Fictionhttp://marvel.com/movies/movie/193/avengers_age_of_ultron99861marvel comic|sequel|superhero|based on comic book|vision|superhero team|duringcreditsstinger|marvel cinematic universe|3dEnglishAvengers: Age of UltronWhen Tony Stark tries to jumpstart a dormant peacekeeping program, things go awry and Earth’s Mightiest Heroes are put to the ultimate test as the fate of the planet hangs in the balance. As the villainous Ultron emerges, it is up to The Avengers to stop him from enacting his terrible plans, and soon uneasy alliances and unexpected action pave the way for an epic and unique global adventure.134.279229[{'name': 'Marvel Studios', 'id': 420}, {'name': 'Prime Focus', 'id': 15357}, {'name': 'Revolution Sun Studios', 'id': 76043}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2015-04-221405403694141[{'iso_639_1': 'en', 'name': 'English'}]ReleasedA New Age Has Come.Avengers: Age of Ultron7.367672015United States of AmericaJoss WhedonChris HemsworthMark RuffaloChris Evans
88250000000Adventure|Fantasy|Familyhttp://harrypotter.warnerbros.com/harrypotterandthehalf-bloodprince/dvd/index.html767witch|magic|broom|school of witchcraft|wizardry|apparition|teenage crush|werewolfEnglishHarry Potter and the Half-Blood PrinceAs Harry begins his sixth year at Hogwarts, he discovers an old book marked as 'Property of the Half-Blood Prince', and begins to learn more about Lord Voldemort's dark past.98.885637[{'name': 'Warner Bros.', 'id': 6194}, {'name': 'Heyday Films', 'id': 7364}][{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]2009-07-07933959197153[{'iso_639_1': 'en', 'name': 'English'}]ReleasedDark Secrets RevealedHarry Potter and the Half-Blood Prince7.452932009United KingdomDavid YatesRupert GrintEmma WatsonTom Felton
99250000000Action|Adventure|Fantasyhttp://www.batmanvsupermandawnofjustice.com/209112dc comics|vigilante|superhero|based on comic book|revenge|super powers|clark kent|bruce wayne|dc extended universeEnglishBatman v Superman: Dawn of JusticeFearing the actions of a god-like Super Hero left unchecked, Gotham City’s own formidable, forceful vigilante takes on Metropolis’s most revered, modern-day savior, while the world wrestles with what sort of hero it really needs. And with Batman and Superman at war with one another, a new threat quickly arises, putting mankind in greater danger than it’s ever known before.155.790452[{'name': 'DC Comics', 'id': 429}, {'name': 'Atlas Entertainment', 'id': 507}, {'name': 'Warner Bros.', 'id': 6194}, {'name': 'DC Entertainment', 'id': 9993}, {'name': 'Cruel & Unusual Films', 'id': 9995}, {'name': 'RatPac-Dune Entertainment', 'id': 41624}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2016-03-23873260194151[{'iso_639_1': 'en', 'name': 'English'}]ReleasedJustice or revengeBatman v Superman: Dawn of Justice5.770042016United States of AmericaZack SnyderHenry CavillGal GadotAmy Adams

Last rows

Unnamed: 0budgetgenreshomepageidplot_keywordslanguageoriginal_titleoverviewpopularityproduction_companiesproduction_countriesrelease_dategrossdurationspoken_languagesstatustaglinemovie_titlevote_averagenum_voted_userstitle_yearcountrydirector_nameactor_1_nameactor_2_nameactor_3_name
479347930DramaUNK182291confession|hazing|gang member|latino|lgbt|catholic priest|shakespeare's romeo and juliet|latino lgbt|gang initiation|gunplayUNKOn The DownlowIsaac and Angel are two young Latinos involved in a south side Chicago gang. They have a secret in a world where secrets are forbidden.0.029757[{'name': 'Iconoclast Films', 'id': 26677}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2004-04-11090[]ReleasedTwo gangs. One secret. One crossroad.On The Downlow6.022004United States of AmericaTadeo GarciaMichael CortezDonato CruzFelipe Camacho
479447940Thriller|Horror|ComedyUNK286939UNKEnglishSanctuary: Quite a ConundrumIt should have been just a normal day of sex, fun, alcohol, hormones and debauchery for Tabitha and Mimi, two over-privileged twenty-somethings. But that so-called normalcy gets tossed out the window when a devastating event occurs at a pool party.0.166513[{'name': 'Gold Lion Films', 'id': 37870}, {'name': 'T-Street Productions', 'id': 37871}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2012-01-20082[{'iso_639_1': 'en', 'name': 'English'}]ReleasedUNKSanctuary: Quite a Conundrum0.002012United States of AmericaThomas L. PhillipsErin ClineEmily RogersAnthony Rutowicz
479547950DramaUNK124606gang|audition|police fake|homeless|actressEnglishBangA young woman in L.A. is having a bad day: she's evicted, an audition ends with a producer furious she won't trade sex for the part, and a policeman nabs her for something she didn't do, demanding fellatio to release her. She snaps, grabs his gun, takes his uniform, and leaves him cuffed to a tree where he's soon having a defenseless chat with a homeless man. She takes off on the cop's motorcycle and, for an afternoon, experiences a cop's life. She talks a young man out of suicide and then is plunged into violence after a friendly encounter with two "vatos." She is torn between self-protection and others' expectations. Is there any resolution for her torrent of feelings?0.918116[{'name': 'Asylum Films', 'id': 10571}, {'name': 'FM Entertainment', 'id': 26598}, {'name': 'Eagle Eye Films Inc.', 'id': 40739}][{'iso_3166_1': 'US', 'name': 'United States of America'}]1995-09-09098[{'iso_639_1': 'en', 'name': 'English'}]ReleasedSometimes you've got to break the rulesBang6.011995United States of AmericaAsh Baron-CohenPeter GreeneMichael NewlandErik Schrody
479647967000Science Fiction|Drama|Thrillerhttp://www.primermovie.com14337distrust|garage|identity crisis|time travel|time machine|mathematics|independent film|paradox|mechanical engineeringEnglishPrimerFriends/fledgling entrepreneurs invent a device in their garage that reduces the apparent mass of any object placed inside it, but they accidentally discover that it has some highly unexpected capabilities -- ones that could enable them to do and to have seemingly anything they want. Taking advantage of this unique opportunity is the first challenge they face. Dealing with the consequences is the next.23.307949[{'name': 'Thinkfilm', 'id': 446}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2004-10-0842476077[{'iso_639_1': 'en', 'name': 'English'}]ReleasedWhat happens if it actually works?Primer6.96582004United States of AmericaShane CarruthDavid SullivanCasey GoodenAnand Upadhyaya
479747970Foreign|ThrillerUNK67238UNKUNKCaviteAdam, a security guard, travels from California to the Philippines, his native land, for his father's funeral. He arrives in Manila. As he waits, a phone rings in his backpack; he answers it, and a male voice tells him that his mother and sister are captives and will be killed if Adam doesn't cooperate. Over the next hour, the voice sends Adam by bus, taxi, motorized tricycle, and on foot through an urban landscape of busy streets, cramped apartments, a fetid squatters' camp, a bank, a cockfighting arena, and a church. Adam's conversations with the voice cover murder, Islam, jihad, rebellion in Mindanao, and his family. What is it Adam will be commanded to do?0.022173[][]2005-03-12080[]ReleasedUNKCavite7.522005UNKNeill Dela LlanaUNKUNKUNK
47984798220000Action|Crime|ThrillerUNK9367united states–mexico barrier|legs|arms|paper knife|guitar caseEspañolEl MariachiEl Mariachi just wants to play his guitar and carry on the family tradition. Unfortunately, the town he tries to find work in has another visitor...a killer who carries his guns in a guitar case. The drug lord and his henchmen mistake El Mariachi for the killer, Azul, and chase him around town trying to kill him and get his guitar case.14.269792[{'name': 'Columbia Pictures', 'id': 5}][{'iso_3166_1': 'MX', 'name': 'Mexico'}, {'iso_3166_1': 'US', 'name': 'United States of America'}]1992-09-04204092081[{'iso_639_1': 'es', 'name': 'Español'}]ReleasedHe didn't come looking for trouble, but trouble came looking for him.El Mariachi6.62381992MexicoRobert RodriguezJaime de HoyosPeter MarquardtReinol Martinez
479947999000Comedy|RomanceUNK72766UNKUNKNewlywedsA newlywed couple's honeymoon is upended by the arrivals of their respective sisters.0.642552[][]2011-12-26085[]ReleasedA newlywed couple's honeymoon is upended by the arrivals of their respective sisters.Newlyweds5.952011UNKEdward BurnsKerry BishéMarsha DietleinCaitlin Fitzgerald
480048000Comedy|Drama|Romance|TV Moviehttp://www.hallmarkchannel.com/signedsealeddelivered231617date|love at first sight|narration|investigation|team|postal workerEnglishSigned, Sealed, Delivered"Signed, Sealed, Delivered" introduces a dedicated quartet of civil servants in the Dead Letter Office of the U.S. Postal System who transform themselves into an elite team of lost-mail detectives. Their determination to deliver the seemingly undeliverable takes them out of the post office into an unpredictable world where letters and packages from the past save lives, solve crimes, reunite old loves, and change futures by arriving late, but always miraculously on time.1.444476[{'name': 'Front Street Pictures', 'id': 3958}, {'name': 'Muse Entertainment Enterprises', 'id': 6438}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2013-10-130120[{'iso_639_1': 'en', 'name': 'English'}]ReleasedUNKSigned, Sealed, Delivered7.062013United States of AmericaScott SmithKristin BoothCrystal LoweGeoff Gustafson
480148010UNKhttp://shanghaicalling.com/126186UNKEnglishShanghai CallingWhen ambitious New York attorney Sam is sent to Shanghai on assignment, he immediately stumbles into a legal mess that could end his career. With the help of a beautiful relocation specialist, a well-connected old-timer, a clever journalist, and a street-smart legal assistant, Sam might just save his job, find romance, and learn to appreciate the beauty and wonders of Shanghai. Written by Anonymous (IMDB.com).0.857008[][{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'CN', 'name': 'China'}]2012-05-03098[{'iso_639_1': 'en', 'name': 'English'}]ReleasedA New Yorker in ShanghaiShanghai Calling5.772012United States of AmericaDaniel HsiaEliza CoupeBill PaxtonAlan Ruck
480248020DocumentaryUNK25975obsession|camcorder|crush|dream girlEnglishMy Date with DrewEver since the second grade when he first saw her in E.T. The Extraterrestrial, Brian Herzlinger has had a crush on Drew Barrymore. Now, 20 years later he's decided to try to fulfill his lifelong dream by asking her for a date. There's one small problem: She's Drew Barrymore and he's, well, Brian Herzlinger, a broke 27-year-old aspiring filmmaker from New Jersey.1.929883[{'name': 'rusty bear entertainment', 'id': 87986}, {'name': 'lucky crow films', 'id': 87987}][{'iso_3166_1': 'US', 'name': 'United States of America'}]2005-08-05090[{'iso_639_1': 'en', 'name': 'English'}]ReleasedUNKMy Date with Drew6.3162005United States of AmericaBrian HerzlingerBrian HerzlingerCorey FeldmanEric Roberts